Speech Intelligibility

A Bayesian generalized linear, latent and mixed modeling approach

Authors
Affiliations

University of Antwerp

University of Antwerp

University of Antwerp

Published

August 4, 2023

Aim

The purpose of this walk-through is to introduce and document the analyses developed for the study Speech Intelligibility: A Bayesian generalized linear, latent and mixed modeling approach (citation needed).

The R packages utilized in the production of this document can be divided in three groups. First, the packages utilized to generate this document: RColorBrewer (Neuwirth, 2022) and quarto (Allaire, Teague, Scheidegger, Xie, and Dervieux, 2022). Second, the packages used for the handling the data: stringr (H. Wickham, 2022), dplyr (Hadley Wickham, François, Henry, Müller, and Vaughan, 2023), tidyverse (Hadley Wickham et al., 2019), and reshape2 (H. Wickham, 2007). Lastly, the packages used for the Bayesian implementation: coda (Plummer, Best, Cowles, and Vines, 2006), loo (A. Vehtari et al., 2023; A. Vehtari, Gelman, and Gabry, 2017), cmdstanr (Gabry and Češnovar, 2022), rstan (Stan Development Team, 2020), runjags (Denwood, 2016), and rethinking (McElreath, 2021).

Packages

Organization

In this walk-through, Section 3 introduce various background topics that are relevant to the present study. These topics enable readers to progress smoothly through this research. Specifically, Section 3.1 provides a brief explanation of how Bayesian inference procedures work and their importance for this research. Section 3.2 is devoted to explaining the difference between two particular distributions, the normal and the beta-proportion distribution, and their role on modeling bounded data. Section 3.3 explain the (generalized) linear mixed models, elaborating on their role in modeling (non)normal clustered and bounded data. Section 3.4 illustrate the concept of measurement error and the role of latent variables to overcome the problems arising from it. Lastly, Section 3.5 explains the effects of the data distributional departures on the parameter estimates, and its importance for this research.

The specific analysis for this study are elaborated from section Section 4 onwards. Particularly, Section 4 elaborates on the research gaps in the study of intelligibility. Section 5 introduces the research questions that guide the study. Section 6 explores the data and its implications. Section 7 thoroughly develop the methods to analyze the data. Lastly, Section 8 provides answers to the research question at hand.

Interludes

Bayesian inference

Theory

Bayesian inference is an approach to statistical modeling and inference that is primarily based on the Bayes’ theorem. The procedure aims to derive appropriate inference statements about a set of parameters by revising and updating their occurrence probabilities in light of new evidence (Everitt and Skrondal, 2010). The procedure consist on defining the model assumptions in the form of a likelihood for the outcome and a set of prior distributions for the parameters of interest. This allows the derivation of a set of posterior distributions for the parameters, from which the statistical inferences are derived 1. As an example, a simple linear regression model with a parameter \beta can be encoded under the Bayesian inference paradigm in the following form:

Bayesian inference

Approach to statistical modeling and inference, that aims to derive appropriate inference statements about one or a set of parameters by revising and updating their probabilities in light of new evidence (Everitt and Skrondal, 2010).

\begin{align} P(\beta | Y, X ) &= \frac{ P( Y | \beta, X ) \cdot P( \beta ) }{ P( Y ) } \end{align} \tag{1}

where P( Y| \beta, X ) defines the likelihood of the outcome, which represents the probability distribution of the outcome Y, given the parameter \beta and covariate X. Hence, the likelihood is the probability distribution assumption that a set of observations occur, conditioned on the value of a set of parameters (Everitt and Skrondal, 2010). In other words, it describes the assumption about the underlying process that give rise to the data.

Likelihood

Probability distribution assumption that a set of observations occur, given the value of a set of parameters (Everitt and Skrondal, 2010).

P( \beta ) defines the prior distribution of the parameter \beta. A prior is a probability distribution that summarizes the available information about a parameter, prior to observing empirical data (Everitt and Skrondal, 2010). In other words, it reflects the knowledge about a parameter prior to observing any data.

Prior distribution

Probability distribution that summarize information about a random variable or parameter at a given time point, prior to observing empirical data (Everitt and Skrondal, 2010).

P( Y ) defines the probability distribution of the data, which represents the evidence of the observed empirical data.

As a result P( \beta | Y, X ), which denotes the posterior distribution of the parameter, describes the probability distribution of \beta after observing empirical data. Specifically, it combines the likelihood of the outcome and the parameter’s prior distribution, considering that empirical data have been observed.

Posterior distribution

Probability distribution that summarize information about a random variable or parameter after having obtained new information from empirical data (Everitt and Skrondal, 2010).

Before implementing the Bayesian inference procedures, two important concepts related to Equation 1 need to be understood. First, the evidence of the empirical data P(Y) serves as a normalizing constant. This is just another way of saying that the numerator in the equation is rescaled by a constant obtained from calculating P(Y). Consequently, without loosing generalization, the equation can be succinctly rewritten in the following form:

\begin{align} P(\beta | Y, X ) &\propto P( Y | \beta, X ) \cdot P( \beta ) \\ \end{align} \tag{2}

where \propto denotes the proportional symbol. This implies that the posterior distribution of \beta is proportional (up to a constant) to the multiplication of the outcome’s likelihood and the parameter’s prior distribution. This definition make the calculation of the posterior distribution easier, by separating the parameter’s updating process from the integration of new empirical data (this will be clearly seen in the code provided in Section 3.1.3).

Second, a dataset usually have multiple observations of the outcome Y and covariate X, in the form of y_{i} and x_{i}. Therefore, by law of probabilities and assuming independence among the observations, the likelihood of the full dataset can be rewritten as the product of all individual observation likelihoods. Consequently, Equation 2 can also be rewritten as follows:

\begin{align} P(\beta | Y, X ) &\propto \prod_{i=1}^{n} P( y_{i} | \beta, x_{i} ) \cdot P( \beta ) \end{align} \tag{3}

Estimation methods

Several methods within the Bayesian inference procedures can be utilized to estimate the posterior distribution of the parameter, and most of these fall into the category of Markov Chain Monte Carlo methods (MCMC). MCMC are methods to indirectly simulate random observations from probability distributions using stochastic processes (Everitt and Skrondal, 2010) 2.

Markov Chain Monte Carlo (MCMC)

Methods to indirectly simulate random observations from probability distributions using stochastic processes (Everitt and Skrondal, 2010).

However, when the parameters of interest are not large in number, a useful pedagogical method to produce the posterior distribution is the grid approximation method. Through this method, an excellent approximation of the parameter’s posterior distribution can be achieved by considering a finite candidate list of parameter values. This method is used in Section 3.1.3 to illustrate how the Bayesian inference works 3.

Grid approximation

Method to indirectly simulate random observations from low dimensional continuous probability distributions, by considering a finite candidate list of parameter values (McElreath, 2020).

How does it work?

A simple Bayesian linear regression model can be written in the following form:

\begin{align*} y_{i} &= \beta \cdot x_{i} + e_{i} \\ e_{i} &\sim \text{Normal}( 0, 1 ) \\ \beta &\sim \text{Uniform}( -20, +20 ) \end{align*} where y_{i} denotes the outcome’s observation i, \beta the expected effect of the observed covariate x_{i} on the outcome, and e_{i} the outcome’s residual in observation i. Furthermore, the model assumes the residual e_{i} is also normally distributed with mean zero and standard deviation equal to one. Lastly, prior to observe any data, it is assumed that \beta is uniformly distributed within the range of [-20,+20].

However, a more convenient generalized manner to represent the same linear regression model is as follows:

\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &\sim \text{Uniform}( -20, +20 ) \end{align*} In this definition, the component of the Bayesian inference procedure detailed in Section 3.1.1 are more easily spotted. First, about the likelihood, the outcome is assumed to be normally distributed with mean \mu_{i} and standard deviation equal to one. Second, it is assumed \beta has a prior that is a normal distribution with mean zero and standard deviation equal to one. Additionally, the equations reveal that the mean of the outcome \mu_{i} is modeled by a linear predictor composed by the covariate x_{i} and its effect on the outcome \beta.

For illustration purposes, a simulated regression with n=100 observations was generated assuming \beta=0.2. Figure 1 shows the scatter plot of the generated data (see code below). The grid approximation method is used to generate random observations from the posterior distribution of \beta. Two noteworthy results emerge from the approach. Firstly, once the posterior distribution is generated, various summaries can be used to make inferences about the parameter of interest (refer to the code output below). Secondly, when considering a dataset with n=100 observations, the influence of the prior on the posterior distribution of \beta is negligible. Specifically, prior to observe any data, assuming that \beta could take any value within the range of [-20,+20] with equal probability (left panel of Figure 2) did not have a substantial impact on the distribution of \beta after empirical data was observed (right panel of Figure 2).

Code
set.seed(12345)
n = 100

b = 0.2
x = rnorm( n=n, mean=0, sd=1 )

mu_y = b*x
y = rnorm( n=n, mean=mu_y, sd=1 )
1
replication seed
2
simulation sample size
3
covariate effect
4
covariate simulation
5
linear predictor on outcome mean
6
outcome simulation
Code
# grid approximation
Ngp = 1000

b_cand = seq( from=-20, to=20, length.out=Ngp )

udf = function(i){ b_cand[i]*x }
mu_y = sapply( 1:length(b_cand), udf )

udf = function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) }
y_lik = sapply( 1:length(b_cand), udf )

b_prior = rep( 1/40, length(b_cand) )

b_prop = y_lik * b_prior
b_post = b_prop / sum(b_prop)
1
number of points in candidate list
2
candidate list for parameter
3
user defined function: linear predictor for each candidate
4
calculation of the linear predictor for each candidate
5
user defined function: product of individual observation likelihoods
6
outcome data likelihood
7
uniform prior distribution for parameter (min=-20, max=20)
8
proportional posterior distribution for parameter
9
posterior distribution for parameter
Code
paste0( 'true beta = ', b )

b_exp = sum( b_cand * b_post )
paste0( 'estimated beta (expectation) = ', round(b_exp, 3) )

b_max = b_cand[ b_post==max(b_post) ]
paste0( 'estimated beta (maximum probability) = ', round(b_max, 3) )

b_var = sqrt( sum( ( (b_cand-b_exp)^2 ) * b_post ) )
paste0( 'estimated beta (standard deviation) = ', round(b_var, 3) )

b_prob = sum( b_post[ b_cand > 0 ] )
paste0( 'P(estimated beta > 0) = ', round(b_prob, 3) )
1
true values for the parameter
2
expected value for the parameter
3
maximum probability value for the parameter
4
standard deviation for the parameter
5
probability that the parameter is greater than zero
[1] "true beta = 0.2"
[1] "estimated beta (expectation) = 0.299"
[1] "estimated beta (maximum probability) = 0.3"
[1] "estimated beta (standard deviation) = 0.088"
[1] "P(estimated beta > 0) = 1"
Code
plot( x, y, xlim=c(-3,3), ylim=c(-3,3),
      pch=19, col=rgb(0,0,0,alpha=0.3) )
abline( a=0, b=b, lty=2, col='blue' )
abline( a=0, b=b_exp, lty=2, col='red' )
legend( 'topleft', legend=c('true', 'expected'),
        bty='n', col=c('blue','red'), lty=rep(2,3) )
1
simulation plot

Figure 1: Outcome simulation
Code
par(mfrow=c(1,2))

plot( b_cand, b_prior, type='l', xlim=c(-1.5,1.5),
      main='Prior distribution',
      xlab=expression(beta), ylab='probability' ) 
abline( v=0, lty=2, col='gray' )

plot( b_cand, b_post, type='l', xlim=c(-1,1),
      main='Posterior distribution',
      xlab=expression(beta), ylab='probability' ) 
abline( v=c(0, b, b_exp), lty=2, 
        col=c('gray','blue','red') )
legend( 'topleft', legend=c('true', 'expected'),
        bty='n', col=c('blue','red'), lty=rep(2,3) )

par(mfrow=c(1,1))
1
prior distribution density plot
2
posterior distribution density plot

Figure 2: Bayesian inference: grid approximation

Priors and their effects

Prior to observing empirical data, assuming the parameter could take any value within within the range of [-20,+20] with equal probability is not the only prior assumption that can be made. Different levels of uncertainty associated with a parameter can be encoded by different priors. This concept illustrated with Figure 3 through Figure 5, where three different types of priors are used to encode three levels of uncertainty about the parameter \beta.

Code
# grid approximation
Ngp = 1000

post = data.frame( b_cand=seq( from=-20, to=20, length.out=Ngp ) )

ud_func = function(i){ post$b_cand[i]*x }
mu_y = sapply( 1:length(post$b_cand), ud_func )

ud_func = function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) }
y_lik = sapply( 1:length(post$b_cand), ud_func )

post$b_prior1 = rep( 1/40, length(post$b_cand) )
post$b_prior2 = dnorm( post$b_cand, mean=0, sd=0.5 )
post$b_prior3 = dnorm( post$b_cand, mean=0.2, sd=0.05 )

nam = c()
for( i in 1:3 ){
  b_prop = y_lik * post[, paste0('b_prior',i) ] 
  
  nam = c(nam, paste0('b_post',i) )
  post = cbind(post, data.frame( b_prop / sum(b_prop) ) )  
}
names(post)[5:7] = nam
1
number of points in candidate list
2
candidate list for parameter
3
user defined function: linear predictor for each candidate
4
calculation of the linear predictor for each candidate
5
user defined function: product of individual observation likelihoods
6
outcome data likelihood
7
prior 1: uniform prior distribution (min=-20, max=+20)
8
prior 2: normal prior distribution (mean=0, sd=0.5)
9
prior 3: normal prior distribution (mean=0.2, sd=0.05)
10
posterior distribution for each prior

First, the distribution depicted in Figure 3 assumes \beta \sim \text{Uniform}(-20, +20) (similar to what is observed in Section 3.1.3). The distribution does not restrain the effect of \beta to be more probable in any range within [-20, +20]. This type of distribution is commonly referred to as a non-informative prior. A non-informative prior distribution reflects the committal of a distribution to a wide range of values within a specific parameter space prior to observe any data (Everitt and Skrondal, 2010).

Non-informative priors

Prior distribution that reflects the committal of a distribution to a wide range of values within a specific parameter space prior to observe any data (Everitt and Skrondal, 2010).

Code
par(mfrow=c(1,2))

plot( post[, c('b_cand','b_prior1')], type='l',
      xlim=c(-1.5,1.5), main='Prior distribution',
      xlab=expression(beta), ylab='probability' )
abline( v=0, lty=2, col='gray' )

plot( post[, c( 'b_cand','b_post1')], type='l',
      xlim=c(-1,1), main='Posterior distribution',
      xlab=expression(beta), ylab='probability' ) 
abline( v=c(0, b), lty=2, col=c('gray','blue') )

par(mfrow=c(1,1))
1
prior distribution density plot
2
posterior distribution density plot

Figure 3: Bayesian inference: posterior distributions with non-informative prior distribution.

Second, the distribution described in Figure 4 assumes \beta \sim \text{Normal}(0, 0.5). Consequently, the effect of \beta is more probable within the range [-1,+1], with less probability associated with parameter values outside this range. This is a an example of a weakly-informative prior distribution. Weakly informative priors are distributions that exhibit no bias towards positive or negative effects, but constrain very weakly the effects of the parameter within a realistic range (McElreath, 2020).

Weakly informative priors

Prior distributions that exhibit no bias towards positive or negative effects, but constrain very weakly the effects of the parameter within a realistic range (McElreath, 2020).

Code
par(mfrow=c(1,2))

plot( post[, c('b_cand','b_prior2')], type='l',
      xlim=c(-1.5,1.5), main='Prior distribution',
      xlab=expression(beta), ylab='probability' )
abline( v=0, lty=2, col='gray' )

plot( post[, c( 'b_cand','b_post2')], type='l',
      xlim=c(-1,1), main='Posterior distribution',
      xlab=expression(beta), ylab='probability' ) 
abline( v=c(0, b), lty=2, col=c('gray','blue') )

par(mfrow=c(1,1))
1
prior distribution density plot
2
posterior distribution density plot

Figure 4: Bayesian inference: posterior distributions with weakly-informative prior distribution.

Third, the distribution described in Figure 5 assumes \beta \sim \text{Normal}(0.2, 0.05). As a result, the effect of \beta is more probable within the range [0.1,0.3], with less probability associated with parameter values outside this range. This is an example of an informative prior distribution. Informative priors are distributions that expresses specific and definite information about a parameter (McElreath, 2020).

Informative priors

Prior distributions that that expresses specific and definite information about a parameter (McElreath, 2020).

Code
par(mfrow=c(1,2))

plot( post[, c('b_cand','b_prior3')], type='l',
      xlim=c(-1.5,1.5), main='Prior distribution',
      xlab=expression(beta), ylab='probability' )
abline( v=0, lty=2, col='gray' )

plot( post[, c( 'b_cand','b_post3')], type='l',
      xlim=c(-1,1), main='Posterior distribution',
      xlab=expression(beta), ylab='probability' ) 
abline( v=c(0, b), lty=2, col=c('gray','blue') )

par(mfrow=c(1,1))
1
prior distribution density plot
2
posterior distribution density plot

Figure 5: Bayesian inference: posterior distributions with informative prior distributions.

Lastly, regarding the influence of different priors on the posterior distributions, Figure 3 and Figure 4 reveals that non-informative and weakly-informative priors have a negligible influence on the posterior distribution. Both priors result in similar posteriors. Furthermore, the figure shows the data sample size n=100 is still not enough to provide an unbiased and precise estimation of the true effect. In contrast, Figure 5 shows that, informative priors can have a meaningful influence in the posterior distribution. In this particular case, the prior helps to estimate an unbiased and more precise effect. This results shows that when the data sample size is not sufficiently large, the prior assumptions can play a significant role on obtaining appropriate parameter estimates.

What are Hyperpriors?

It is crucial to consider that for a greater modeling flexibility and a more refined representation of the parameters’ uncertainty, sometimes it is convenient to define a prior distribution in terms of hyperparameters and hyperpriors 4. Hyperparameters refer to parameters that index a family of possible prior distributions for another parameter. On the other hand, hyperpriors are prior distributions for these hyperparameters (Everitt and Skrondal, 2010).

Hyperparameters

Parameters \theta_{2} that indexes a family of possible prior distributions for another parameter \theta_{1} (Everitt and Skrondal, 2010).

Hyperpriors

Prior distributions for hyperparameters (Everitt and Skrondal, 2010).

A simple example of the use of hyperpriors would be to define the regression model shown in Section 3.1.3 in the following form:

\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &\sim \text{Normal}( 0, \text{exp}(v) ) \\ v &\sim \text{Normal}(0, 3) \end{align*} where v define the hyperparameter for the parameter \beta, and its associated distribution define its hyperprior.

However, setting prior distributions through hyperparameters brings its own challenges 5. One notable challenge pertains to the geometry of the parameter’s sample space. This implies that specific prior probabilistic representations, defined by the same hyperparameters, exhibit simpler sample geometries compared to others 6. The re-parametrization of priors into such simpler sample geometries leads to the notion of non-centered priors. In this approach, a parameter’s prior distribution is expressed in terms of a hyperparameter, which is defined by a transformation of the original parameter of interest (Gorinova et al., 2019). By incorporating non-centered priors, researchers can ensure the reliability of certain posterior distributions within Bayesian inference procedures. To illustrate, a straightforward example of a non-centered reparametrization of a prior can be demonstrated as follows:

Non-centered priors

Expression of a parameter’s distribution in terms of an hyperparameter defined by a transformation of the original parameter of interest (Gorinova et al., 2019).

\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &= z \cdot \text{exp}(v) \\ v &\sim \text{Normal}(0, 3) \\ z &\sim \text{Normal}( 0, 1 ) \end{align*} where z is a hyperparameter sampled independently from v, and the parameter of interest \beta is obtained as a transformation of the two hyperparameters. Figure 6 illustrates the differences in sampling geometries between a centered and a non-centered parametrization. It is evident that the sampling geometry depicted in the left panel of the figure is narrower than the one depicted in the right panel, and as a result, Bayesian inference procedures have an harder time sampling from the former than the latter distributions.

Code
n = 5000

v = rnorm( n=n, mean=0, sd=1 )
z = rnorm( n=n, mean=0, sd=1 )

b_cent = rnorm( n=n, mean=0, sd=exp(v) )
b_non = z*exp(v)
1
simulation sample size
2
hyperparameter simulation
3
centered parametrization simulation
4
non-centered parameterization simulation
Code
par( mfrow=c(1,2) )

plot( b_cent, v, pch=19, col=rgb(0,0,0,alpha=0.1),
      xlab=expression(beta), ylab=expression(v[b]),
      main='Centered parametrization' ) 

plot( z, v, pch=19, col=rgb(0,0,0,alpha=0.1),
      xlab=expression(z[b]), ylab=expression(v[b]),
      main='Non-centered parametrization' )

par( mfrow=c(1,1) )
1
plot of centered parametrization
2
plot of non-centered parametrization

Figure 6: Centered and non-centered parameter spaces

Importance

Three characteristics of Bayesian inference makes it relevant to the present study. Firstly, they have shown superior performance in dealing with complex and highly-parameterized models (Baker, 1998; Kim and Cohen, 1999). Secondly, they are more suitable for small sample sizes (Baldwin and Fellingham, 2013; Depaoli, 2014; Lambert, Sutton, Burton, Abrams, and Jones, 2006). Lastly, they have an ability to incorporate prior information to constraint parameters within a permissible space (e.g., positive variances), thus preventing common estimation problems associated with classical methods (Martin and McDonald, 1975; Seaman and Stamey, 2011).

Benefits of Bayesian inference procedures

More suitable to deal with:

  1. Complex or highly-parameterized model
  2. Small sample sizes
  3. Parameter’s constraints.

A tale of two distributions

The normal distribution

A normal distribution is a type of continuous probability distribution in which a random variable can take on values along the real line \left( y_{i} \in [-\infty, \infty] \right). The distribution is characterized by two independent parameters: the mean \mu and the standard deviation \sigma (Everitt and Skrondal, 2010). Thus, a random variable can take on values that are gathered around a mean \mu, with some values dispersed based on some amount of deviation \sigma, without any restriction. Importantly, by definition of the normal distribution, the location (mean) of the distribution does not influence its spread (deviation).

Figure 7 illustrates how the distribution of an outcome changes with different values of \mu and \sigma. The left panel demonstrate that the distribution of the outcome can shift in terms of its location based on the value of \mu. The right panel shows how the distribution of the outcome can become narrower or wider based on the values of \sigma. It is noteworthy that alterations in the mean \mu of the distribution have no impact on its standard deviation \sigma.

Code
require(rethinking)

mu = c(-1.5, 0, 1.5)
sigma = c(1.5, 1, 0.5) 

par(mfrow=c(1,2))

cp = sapply( 1:length(mu), col.alpha, alpha=0.7) 
for(i in 1:length(mu)){
  if(i==1){
    curve( dnorm(x, mean=mu[i], sd=1),
           from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i], 
           xlab="outcome values", ylab="density")
    abline(v=mu, col='gray', lty=2)
    legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',
           legend=expression( mu[1]==-1.5,
                              mu[2]==0,
                              mu[3]==+1.5,
                              sigma==1) )
  } else{
    curve( dnorm(x, mean=mu[i], sd=1),
           from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i], 
           xlab="", ylab="", add=T )
  }
}


cp = sapply( 1:length(sigma), col.alpha, alpha=0.7)
for(i in 1:length(sigma)){
  if(i==1){
    curve( dnorm(x, mean=0, sd=sigma[i]),
           from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i],
           xlab="outcome values", ylab="density")
    abline(v=0, col='gray', lty=2)
    legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',
           legend=expression( sigma[1]==1.5,
                              sigma[2]==1,
                              sigma[3]==0.5,
                              mu==0) )
  } else{
    curve( dnorm(x, mean=0, sd=sigma[i]), 
           from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i], 
           xlab="", ylab="", add=T )
  }
}

par(mfrow=c(1,1))
1
required package
2
parameter to plot: means and standard deviations
3
plotting normal distribution with different ‘mu’ and ‘sigma=1’
4
plotting normal distribution with ‘mu=0’ and different sigma’s

Figure 7: Normal distribution with different mean and standard deviations

The beta-proportion distribution

A beta-proportion distribution is a type of continuous probability distribution in which a random variable can assume values within the continuous interval between zero and one \left( y_{i} \in [0, 1] \right). The distribution is characterized by two parameters: the mean \mu and the sample size M (Everitt and Skrondal, 2010). This implies that a random variable can take on values restricted within the unit interval, centered around a mean \mu, with some values being more dispersed based on the sample size M. Additionally, two characteristic define the distribution. Firstly, like the random variable, the mean of the distribution can only take values within the unit interval (\mu \in [0,1]). Secondly, the mean and sample size parameters are no longer independent of each other.

Figure 8 illustrates how an outcome with a beta-proportion distribution changes with different values of \mu and M. The figure reveals two prevalent patterns in the distribution: (1) the behavior of the dispersion, as measured by the sample size, depends on the mean of the distribution, and (2) the larger the sample size, the less dispersed the distribution is within the unit interval.

Code
require(rethinking)

mu = c(0.2, 0.5, 0.8)
M = c(2, 5, 20) 

par(mfrow=c(1,2))

cp = sapply( 1:length(mu), col.alpha, alpha=0.7) 
for(i in 1:length(mu)){
  if(i==1){
    curve( dbeta2(x, prob=mu[i], theta=10),
           from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i], 
           xlab="outcome values", ylab="density")
    abline(v=mu, col='gray', lty=2)
    legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',
           legend=expression( mu[1]==0.2,
                              mu[2]==0.5,
                              mu[3]==0.8,
                              M==10) )
  } else{
    curve( dbeta2(x, prob=mu[i], theta=10),
           from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i], 
           xlab="", ylab="", add=T )
  }
}


cp = sapply( 1:length(M), col.alpha, alpha=0.7)
for(i in 1:length(M)){
  if(i==1){
    curve( dbeta2(x, prob=0.3, theta=M[i]),
           from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i], 
           xlab="outcome values", ylab="density")
    abline(v=0.3, col='gray', lty=2)
    legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',
           legend=expression( M[1]==2,
                              M[2]==5,
                              M[3]==20,
                              mu==0.3) )
  } else{
    curve( dbeta2(x, prob=0.3, theta=M[i]),
           from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i], 
           xlab="", ylab="", add=T )
  }
}

par(mfrow=c(1,1))
1
required package
2
parameter to plot: means and ‘sample size’
3
plotting beta-proportion distribution with different ‘mu’ and ‘M=10’
4
plotting beta-proportion distribution with ‘mu=0.5’ and different M’s

Figure 8: Beta-proportion distribution with different mean and sample sizes

Importance

The assumption of normally distributed outcomes is ubiquitous in speech intelligibility research (see Boonen, Kloots, Nurzia, and Gillis, 2021; Flipsen, 2006; Lagerberg, Asberg, Hartelius, and Persson, 2014). Therefore, it is crucial to comprehend what signifies for an outcome to follow a normal distribution.

In contrast, the significance of the beta-proportion distribution lies in providing a suitable alternative for modeling non-normally bounded distributed outcomes, such as the entropy scores utilized in this study. Boundedness refers to the restriction of data values within specific bounds or intervals, beyond which they cannot occur (Lebl, 2022). Neglecting the bounded nature of an outcome can lead to underfitting. Underfitting occurs when the statistical model fails to capture the underlying patterns or complexity of the data. In such scenario, a model may generate predictions outside the data range, which can be physically inconsistent or highly unlikely (Everitt and Skrondal, 2010), resulting in an inability to generalize its results when confronted with new data.

Boundedness

Refers to the restriction of data values within specific bounds or intervals, beyond which they cannot occur (Lebl, 2022)

Underfitting

When the statistical model fails to capture the underlying patterns or complexity of the data (Everitt and Skrondal, 2010).

Linear Mixed Models

The ordinary LMM

An ordinary linear mixed model (LMM) is a procedure employed to estimate a linear relationship between the mean of a normally distributed outcome with clustered observations, and one or more covariates (Holmes, Bolin, and Kelley, 2019). A commonly know Bayesian probabilistic representation of an ordinary LMM can be expressed as follows:

Ordinary linear mixed model (LMM)

Procedure employed to estimate a linear relationship between the mean of a normally distributed outcome with clustered observations, and one or more covariates (Holmes et al., 2019).

\begin{align*} y_{ib} &= \beta x_{i} + a_{b} + \varepsilon_{ib} \\ \varepsilon_{ib} &\sim \text{Normal}(0, 1) \\ \beta &\sim \text{Normal}(0, 0.5) \\ a_{b} &\sim \text{Normal}(0, 1) \end{align*}

where y_{ib} denotes the outcome’s i’th observation clustered in block b, and x_{i} denotes the covariate for observation i. Moreover, \beta denote the fixed slope of the regression. Furthermore, a_{b} denotes the random effects, and \varepsilon_{ib} defines the random outcome residuals. Furthermore, the residuals \varepsilon_{ib} are assumed to be normally distributed with mean zero and standard deviation equal to one. Additionally, prior to observing any data, \beta is assumed to be normally distributed with mean zero and standard deviation equal to 0.5. Similarly, a_{b} is assumed to be normally distributed with mean zero and standard deviation equal to one.

The generalized LMM

A generalized linear mixed model (GLMM) are a set of models used to estimate (non)linear relationship between the mean of a (non)normally distributed outcome with clustered observations, and one or more covariates (Lee and Nelder, 1996). Interestingly, the ordinary Bayesian LMM detailed in the previous section can be represented as a special case of GLMM, as follows:

Generalized linear mixed model (GLMM)

Procedure employed to estimate (non)linear relationship between the mean of a (non)normally distributed outcome with clustered observations, and one or more covariates (Lee and Nelder, 1996).

\begin{align*} y_{ib} &\sim \text{Normal}( \mu_{ib}, 1) \\ \mu_{ib} &= \beta x_{i} + a_{b} \\ \beta &\sim \text{Normal}(0, 0.5) \\ a_{b} &\sim \text{Normal}(0, 1) \\ \end{align*}

Notice this representation explicitly highlights the four components of a Bayesian GLMM: the likelihood component, the linear predictor, the link function and the priors (McElreath, 2020). The likelihood component specifies the assumption about the distribution of an outcome, in this case a normal distribution with mean \mu_{ib} and standard deviation equal to one. The linear predictor specifies the manner in which the covariate will predict the mean of the outcome. In this case the linear predictor is a linear combination of the parameter \beta, the covariate x_{i}, and the random effects a_{b}. The link function specifies the relationship between the mean of the outcome \mu_{ib} and the linear predictor. In this case no transformation is applied to the linear predictor to match its range with the range of the outcome, as both can take on values within the real line (refer to Section 3.2.1). Lastly, the priors describe what is known about the parameters \beta and a_{b} before observing any empirical data.

GLMM components

  1. Likelihood component
  2. Linear predictor
  3. Link function
  4. Prior distributions

On the other hand, a beta-proportion LMM is also a GLMM, and it can be represented probabilistically as follows:

\begin{align*} y_{ib} &\sim \text{BetaProp}( \mu_{ib}, 10 ) \\ \mu_{ib} &= \text{logit}^{-1}( \beta x_{i} + a_{b} ) \\ \beta &\sim \text{Normal}(0, 0.5) \\ a_{b} &\sim \text{Normal}(0, 1) \\ \end{align*}

Notice the representation also highlights the four components of a Bayesian GLMM; however, their assumptions are now slightly different. The likelihood component assumes a beta-proportion distribution for the outcome with mean \mu_{ib} and sample size equal to 10. The linear predictor is still a linear combination of the parameter \beta, the covariate x_{i}, and the random intercepts a_{b}. However, the link function now assumes the mean of the outcome is (non)linearly related to the linear predictor by a inverse-logit function: \text{logit}^{-1}(x) = exp(x) / (1+exp(x)). The inverse-logit function allows the linear predictor to match the range observed in the mean of the beta-proportion distribution \mu_{ib} \in [0,1] (refer to Section 3.2.2). Lastly, the prior assumptions for \beta and a_{b} are also declared.

Importance

Understanding LMM is essential due to the ubiquitous assumption of normally distributed outcomes within the speech intelligibility research field (see Boonen et al., 2021; Flipsen, 2006; Lagerberg et al., 2014). Furthermore, their significance also lies in their ability to model clustered outcomes. Clustering occurs when multiple observations arise from the same individual, location, or time (McElreath, 2020). Accounting for data clustering is essential, as disregarding it may result in biased and inefficient parameter estimates. Consequently, such biases and inefficiencies can diminish statistical power or increase the likelihood of committing a type I error. Statistical power defines the model’s ability to reject the null hypothesis when it is false (Everitt and Skrondal, 2010). Type I error occurs when a null hypothesis is erroneously rejected (Everitt and Skrondal, 2010).

Clustering

Occurs when multiple observations arise from the same individual, location, or time (McElreath, 2020).

Statistical power

The model’s ability to reject the null hypothesis when it is false (Everitt and Skrondal, 2010).

Type I error

The error that results when a null hypothesis is erroneously rejected (Everitt and Skrondal, 2010).

Moreover, the significance of GLMM lies in offering the same benefits as the LMMs, in terms of parameter unbiasedness and efficiency. However, the framework also allows for the modeling of (non)linear relationships of (non)normally distributed outcomes. This is particularly important for modeling bounded data, such as the entropy scores utilized in this study. Refer to Section 3.2.3 to understand the importance of considering the bounded nature of the data in the modeling process.

Measurement error in an outcome

What is the problem?

The problem of measurement error in an outcome is easier to understand with a motivating example. Using a similar model as the one depicted in Section 3.1.3, the probabilistic representation of measurement error in the outcome can be depicted as follows:

\begin{align*} \tilde{y}_{i} &\sim \text{Normal}( y_{i}, s ) \\ y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &\sim \text{Uniform}( -20, 20 ) \end{align*} This representation effectively means that a manifest outcome \tilde{y}_{i} is assumed to be normally distributed with a mean equal to the latent outcome y_{i} and a measurement error s. The latent outcome y_{i} is also assumed to be normally distributed but with a mean \mu_{i} and a standard deviation of one. The mean of the latent outcome is considered to be explained by a linear combination of the covariate x_{i} and its expected effect \beta. Lastly, prior to observing any data, \beta is assumed to follow a uniform distribution within the range of [-20, +20], representing a non-informative prior.

For illustrative purposes, a simulated outcome with n=100 observations was generated, assuming \beta=0.2, and a measurement error of s=2. Figure 9 shows the scatter plot of the generated data (see code below). The left panel of the figure demonstrates that the manifest outcome has a larger spread than the latent outcome depicted in the right panel. As a result, although \beta is expected to be estimated in an unbiased manner, the statistical hypothesis tests for the parameter will likely be affected due to this larger variability.

The estimation output confirms the previous hypothesis. The posterior distribution of \beta, estimated using the manifest outcome, has a larger standard deviation than the one estimated using the appropriate latent outcome (see Figure 10 and code output below). Furthermore, the code output shows the parameter’s posterior distribution can no longer reject the null hypothesis at confidence levels of 90\% and 95\%, indicating a reduced statistical power.

Code
set.seed(12345)
n = 100

b = 0.2
x = rnorm( n=n, mean=0, sd=1 )

mu_y = b*x
y = rnorm( n=n, mean=mu_y, sd=1 )

s = 2
y_tilde = rnorm( n=n, mean=y, sd=s )
1
replication seed
2
simulation sample size
3
covariate effect
4
covariate simulation
5
linear predictor on outcome mean
6
latent outcome simulation
7
measurement error
8
manifest outcome simulation
Code
# grid approximation
Ngp = 1000

b_cand = seq( from=-20, to=20, length.out=Ngp )

udf = function(i){ b_cand[i]*x }
mu_y = sapply( 1:length(b_cand), udf )

udf = function(i){ prod( dnorm( y_tilde, mean=mu_y[,i], sd=s ) ) }
y_lik_man = sapply( 1:length(b_cand), udf )

udf = function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) }
y_lik_lat = sapply( 1:length(b_cand), udf )

b_prior = rep( 1/40, length(b_cand) )

b_prop_man = y_lik_man * b_prior
b_post_man = b_prop_man / sum(b_prop_man)

b_prop_lat = y_lik_lat * b_prior
b_post_lat = b_prop_lat / sum(b_prop_lat)
1
number of points in candidate list
2
candidate list for parameter
3
user defined function: linear predictor for each candidate
4
calculation of the linear predictor for each candidate
5
user defined function: product of individual observation likelihoods for manifest outcome
6
manifest outcome data likelihood
7
user defined function: product of individual observation likelihoods for latent outcome
8
latent outcome data likelihood
9
uniform prior distribution for parameter, on manifest and latent outcomes
10
proportional posterior distribution for parameter on manifest outcome
11
posterior distribution for parameter on manifest outcome
12
proportional posterior distribution for parameter on latent outcome
13
posterior distribution for parameter on latent outcome
Code
paste0( 'true beta = ', b )

# manifest outcome
b_exp_man = sum( b_cand * b_post_man )
paste0( 'estimated beta (expectation on manifest) = ', 
        round(b_exp_man, 3) )

b_var_man = sqrt( sum( ( (b_cand-b_exp_man)^2 ) * b_post_man ) )
paste0( 'estimated beta (standard deviation on manifest) = ', 
        round(b_var_man, 3) )


# latent outcome
b_exp_lat = sum( b_cand * b_post_lat )
paste0( 'estimated beta (expectation on latent) = ', 
        round(b_exp_lat, 3) )

b_var_lat = sqrt( sum( ( (b_cand-b_exp_lat)^2 ) * b_post_lat ) )
paste0( 'estimated beta (standard deviation on latent) = ', 
        round(b_var_lat, 3) )

# null hypothsis rejection
b_prob_man = sum( b_post_man[ b_cand > 0 ] )
paste0( 'P(estimated beta on manifest > 0) = ', 
        round(b_prob_man, 3) )

b_prob_lat = sum( b_post_lat[ b_cand > 0 ] )
paste0( 'P(estimated beta on latent > 0) = ', 
        round(b_prob_lat, 3) )
1
true values for the parameter
2
expected value for the parameter on manifest outcome
3
standard deviation for the parameter on manifest outcome
4
expected value for the parameter on latent outcome
5
standard deviation for the parameter on latent outcome
6
probability that the parameter is greater than zero, on manifest outcome
7
probability that the parameter is greater than zero, on latent outcome
[1] "true beta = 0.2"
[1] "estimated beta (expectation on manifest) = 0.212"
[1] "estimated beta (standard deviation on manifest) = 0.176"
[1] "estimated beta (expectation on latent) = 0.299"
[1] "estimated beta (standard deviation on latent) = 0.088"
[1] "P(estimated beta on manifest > 0) = 0.887"
[1] "P(estimated beta on latent > 0) = 1"
Code
par( mfrow=c(1,2) )

plot( x, y_tilde, xlim=c(-3,3), ylim=c(-7,7),
      pch=19, col=rgb(0,0,0,alpha=0.3),
      ylab=expression(tilde(y)),
      main='manifest outcome' )
abline( a=0, b=b, lty=2, col='blue')
abline( a=0, b=b_exp_man, lty=2, col='red' )
legend( 'topleft', legend=c('true', 'expected'),
        bty='n', col=c('blue','red'), lty=rep(2,3) )

plot( x, y, xlim=c(-3,3), ylim=c(-7,7),
      pch=19, col=rgb(0,0,0,alpha=0.3),
      main='latent outcome' )
abline( a=0, b=b, lty=2, col='blue')
abline( a=0, b=b_exp_lat, lty=2, col='red' )
legend( 'topleft', legend=c('true', 'expected'),
        bty='n', col=c('blue','red'), lty=rep(2,3) )

par( mfrow=c(1,1) )
1
simulation plot of manifest outcome
2
simulation plot of latent outcome

Figure 9: Measurement error simulation
Code
par(mfrow=c(1,2))

plot( b_cand, b_post_man, type='l', xlim=c(-0.5,1),
      main='Posterior on manifest outcome',
      xlab=expression(beta), ylab='probability' ) 
abline( v=c(0, b, b_exp_man), lty=2, col=c('gray', 'blue', 'red') )
legend( 'topleft', legend=c('true', 'expected'),
        bty='n', col=c('blue','red'), lty=rep(2,3) )

plot( b_cand, b_post_lat, type='l', xlim=c(-0.5,1),
      main='Posterior on latent outcome',
      xlab=expression(beta), ylab='probability' ) 
abline( v=c(0, b, b_exp_lat), lty=2, col=c('gray', 'blue','red') )
legend( 'topleft', legend=c('true', 'expected'),
        bty='n', col=c('blue','red'), lty=rep(2,3) )

par(mfrow=c(1,1))
1
prior distribution density plot
2
posterior distribution density plot

Figure 10: Bayesian inference: grid approximation on measurement error outcomes

How to solve it?

Latent variables can be used to address the problem arising from the larger observed variability in one or more manifest outcomes. A latent variable is a variable that cannot be directly measured but is assumed to be primarily responsible for the variability in one or more manifest variables (Everitt and Skrondal, 2010). Latent variables can be interpreted as hypothetical constructs, traits, or true variables that account for the variability that induce dependence in one or more manifest variables (Rabe-Hesketh, Skrondal, and Pickles, 2004a). This concept is akin to a linear mixed model, where the random effects serve to account for the variability that induces dependence within clustered outcomes (Rabe-Hesketh et al., 2004a) (refer to Section 3.3). The most widely known examples of latent variable models include Confirmatory Factor Analysis and Structural Equation Models (CFA and SEM, respectively).

Latent variables

Variables that cannot be measured directly but are assumed to be the principal responsible for the common variability in one or more manifest variables (Everitt and Skrondal, 2010).

Commonly, latent variable models consist of two parts: a measurement part and a structural part. In the measurement part, the principles of the Thurstonian model (Luce, 1959; Thurstone, 1927) are employed to aggregate one or more manifest variables and estimate a latent variable. In the structural part, regression-like relationships among latent and other manifest variables are specified, allowing researchers to test hypotheses about their (causal) relationships (Hoyle, 2014). While the measurement part is sometimes of interest in its own right, the substantive model of interest is often defined by the structural part (Rabe-Hesketh et al., 2004a).

Importance

It becomes evident that when an outcome is measured with error, the estimation procedures based on standard assumptions yield inefficient parameter estimates. This implies that the parameters are not estimated with sufficient precision. Consequently, such inefficiency can reduce statistical power and increase the likelihood of committing a type II error, which occurs when a null hypothesis is erroneously accepted (Everitt and Skrondal, 2010).

Type II error

The error that results when a null hypothesis is erroneously accepted (Everitt and Skrondal, 2010).

Therefore, the issue of measurement error in an outcome is highly relevant to this study. This research assumes that a speaker’s (latent) potential intelligibility contributes, in part, to the observed variability in the speaker’s (manifest) entropy scores. Given the interest in testing hypotheses about the potential intelligibility of speakers, and considering that the entropy scores are subject to measurement error, it becomes necessary to use latent variables to generate precise parameter estimates to test the hypothesis of interest.

Distributional departures

Heteroscedasticity

In the context of regression analysis, heteroscedasticity occurs when the variance of an outcome depends on the values of another variable. The opposite case is called homoscedasticity. An example of heteroscedasticity can be probabilistically represented as follows:

Heteroscedasticity

Occurs when the variance (standard deviation) of an outcome depends on the values of another variable. The opposite case is called homoscedasticity (Everitt and Skrondal, 2010).

\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, \sigma_{i} ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \sigma_{i} &= exp( \gamma \cdot x_{i} ) \\ \beta &\sim \text{Uniform}( -20, 20 ) \\ \gamma &\sim \text{Uniform}( -20, 20 ) \end{align*} This representation implies that an outcome y_{i} is assumed normally distributed with mean \mu_{i} and a standard deviation \sigma_{i}. Furthermore, the mean and standard deviation of the outcome is explained by the covariate x_{i}, through the parameters \beta and \gamma. Lastly, prior to observing any data, \beta and \gamma are assumed to be uniformly distributed in the range of [-20,+20].

Figure 11 illustrate the presence of heteroscedasticity using the previous representation, assuming a sample size of n=100, and parameters \beta=0.2 and \gamma=1. Notice the variability of the outcome increases as the covariate also increases. Consequently, it is easy to intuit that this difference in the outcome’s variability could have and impact on the statistical hypothesis tests of \beta, and even in the estimate itself. To prove the intuition, an incorrect model is used to estimate \beta.

\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &\sim \text{Uniform}( -20, 20 ) \\ \end{align*} As a result, the hypotheses are proven accurate. When an outcome is erroneously assumed homoscedastic, the parameter estimates not only become inefficient but also are not estimated closer to the true value, as seen in the output code below and in Figure 12.

Code
set.seed(12345)
n = 100

b = 0.2
g = 1

x = rnorm( n=n, mean=0, sd=1 )

mu_y = b*x
s_y = exp(g*x) 

y = rnorm( n=n, mean=mu_y, sd=s_y )
1
replication seed
2
simulation sample size
3
beta and gamma effects
4
covariate simulation
5
(non)linear predictor on outcome mean and standard deviation
6
outcome simulation
Code
# grid approximation
Ngp = 1000

b_cand = seq( from=-20, to=20, length.out=Ngp )

udf = function(i){ b_cand[i]*x }
mu_y = sapply( 1:length(b_cand), udf )

udf = function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) }
y_lik = sapply( 1:length(b_cand), udf )

b_prior = rep( 1/40, length(b_cand) )

b_prop = y_lik * b_prior
b_post = b_prop / sum(b_prop)
1
number of points in candidate list
2
candidate list for parameter
3
user defined function: linear predictor for each candidate
4
calculation of the linear predictor for each candidate
5
user defined function: product of individual observation likelihoods
6
outcome data likelihood
7
uniform prior distribution for parameter (min=-20, max=20)
8
proportional posterior distribution for parameter
9
posterior distribution for parameter
Code
paste0( 'true beta = ', b )

b_exp = sum( b_cand * b_post )
paste0( 'estimated beta (expectation) = ', round(b_exp, 3) )

b_max = b_cand[ b_post==max(b_post) ]
paste0( 'estimated beta (maximum probability) = ', round(b_max, 3) )

b_var = sqrt( sum( ( (b_cand-b_exp)^2 ) * b_post ) )
paste0( 'estimated beta (standard deviation) = ', round(b_var, 3) )

b_prob = sum( b_post[ b_cand > 0 ] )
paste0( 'P(estimated beta > 0) = ', round(b_prob, 3) )
1
true values for the parameter
2
expected value for the parameter
3
maximum probability value for the parameter
4
standard deviation for the parameter
5
probability that the parameter is greater than zero
[1] "true beta = 0.2"
[1] "estimated beta (expectation) = 0.638"
[1] "estimated beta (maximum probability) = 0.621"
[1] "estimated beta (standard deviation) = 0.088"
[1] "P(estimated beta > 0) = 1"
Code
plot( x, y, xlim=c(-3,3), ylim=c(-6,6),
      pch=19, col=rgb(0,0,0,alpha=0.3) )
abline( a=0, b=b, lty=2, col='blue')
abline( a=0, b=b_exp, lty=2, col='red' )
abline( a=-4, b=-1, lty=2)
abline( a=4.4, b=1.5, lty=2)
legend( 'topleft', legend=c('true', 'expected'),
        bty='n', col=c('blue','red'), lty=rep(2,3) )
1
scatter plot of an heteroscedastic outcome

Figure 11: Heteroscedasticity simulation
Code
par(mfrow=c(1,2))

plot( b_cand, b_prior, type='l', xlim=c(-1.5,1.5),
      main='Prior distribution',
      xlab=expression(beta), ylab='probability' ) 
abline( v=0, lty=2, col='gray' )

plot( b_cand, b_post, type='l', xlim=c(-1,1),
      main='Posterior distribution',
      xlab=expression(beta), ylab='probability' ) 
abline( v=c(0, b, b_exp, b_max), lty=2, 
        col=c('gray','blue', rainbow(2)) )
legend( 'topleft', legend=c('true', 'expected', 'max prob.'),
        bty='n', col=c('blue',rainbow(2)), lty=rep(2,3) )

par(mfrow=c(1,1))
1
prior distribution density plot
2
posterior distribution density plot

Figure 12: Bayesian inference: grid approximation

Outliers

In regression analysis, outliers are defined as observations that appear to deviate markedly from other sample data points in which they occur (Everitt and Skrondal, 2010). Although no unique probabilistic representation of outliers can be represented, a simple example can be illustrated with Figure 13. The figure depicts the presence of three influential observations in the outcome (colored blue). It is easier to intuit that with the presence of influential observations the parameter estimates, and the hypothesis test resulting from them, can be affected.

Outlier

Observation that appear to deviate markedly from other sample data points in which it occurs (Everitt and Skrondal, 2010).

The intuition is proven correct when \beta is estimated using the same incorrect model used in Section 3.5.1. When an outcome is erroneously assumed without outliers, the parameter value is estimated farther from the truth, as observed in the code output below and in Figure 14.

\begin{align*} y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\ \mu_{i} &= \beta \cdot x_{i} \\ \beta &\sim \text{Uniform}( -20, 20 ) \\ \end{align*}

Code
set.seed(12345)
n = 100

b = 0.2
x = rnorm( n=n, mean=0, sd=1 )

mu_y = b*x
y = rnorm( n=n, mean=mu_y, sd=1 )

idx = which( x>1 )
sel = 1:3
y[idx[sel]] = 6 
1
replication seed
2
simulation sample size
3
beta effects
4
covariate simulation
5
linear predictor on outcome mean
6
outcome simulation
7
outlier simulation
Code
# grid approximation
Ngp = 1000

b_cand = seq( from=-20, to=20, length.out=Ngp )

udf = function(i){ b_cand[i]*x }
mu_y = sapply( 1:length(b_cand), udf )

udf = function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) }
y_lik = sapply( 1:length(b_cand), udf )

b_prior = rep( 1/40, length(b_cand) )

b_prop = y_lik * b_prior
b_post = b_prop / sum(b_prop)
1
number of points in candidate list
2
candidate list for parameter
3
user defined function: linear predictor for each candidate
4
calculation of the linear predictor for each candidate
5
user defined function: product of individual observation likelihoods
6
outcome data likelihood
7
uniform prior distribution for parameter (min=-20, max=20)
8
proportional posterior distribution for parameter
9
posterior distribution for parameter
Code
paste0( 'true beta = ', b )

b_exp = sum( b_cand * b_post )
paste0( 'estimated beta (expectation) = ', round(b_exp, 3) )

b_max = b_cand[ b_post==max(b_post) ]
paste0( 'estimated beta (maximum probability) = ', round(b_max, 3) )

b_var = sqrt( sum( ( (b_cand-b_exp)^2 ) * b_post ) )
paste0( 'estimated beta (standard deviation) = ', round(b_var, 3) )

b_prob = sum( b_post[ b_cand > 0 ] )
paste0( 'P(estimated beta > 0) = ', round(b_prob, 3) )
1
true values for the parameter
2
expected value for the parameter
3
maximum probability value for the parameter
4
standard deviation for the parameter
5
probability that the parameter is greater than zero
[1] "true beta = 0.2"
[1] "estimated beta (expectation) = 0.477"
[1] "estimated beta (maximum probability) = 0.46"
[1] "estimated beta (standard deviation) = 0.088"
[1] "P(estimated beta > 0) = 1"
Code
plot( x, y, xlim=c(-3,3), ylim=c(-6,6),
      pch=19, col=rgb(0,0,0,alpha=0.3) )
points( x[idx[sel]], y[idx[sel]],
        pch=19, col=rgb(0,0,1,alpha=0.3) )
abline( a=0, b=b, lty=2, col='blue')
abline( a=0, b=b_exp, lty=2, col='red' )
legend( 'topleft', legend=c('true', 'expected'),
        bty='n', col=c('blue','red'), lty=rep(2,3) )
1
scatter plot of an outcome with outliers

Figure 13: Outliers simulation
Code
par(mfrow=c(1,2))

plot( b_cand, b_prior, type='l', xlim=c(-1.5,1.5),
      main='Prior distribution',
      xlab=expression(beta), ylab='probability' ) 
abline( v=0, lty=2, col='gray' )

plot( b_cand, b_post, type='l', xlim=c(-1,1),
      main='Posterior distribution',
      xlab=expression(beta), ylab='probability' ) 
abline( v=c(0, b, b_exp, b_max), lty=2, 
        col=c('gray','blue', rainbow(2)) )
legend( 'topleft', legend=c('true', 'expected', 'max prob.'),
        bty='n', col=c('blue',rainbow(2)), lty=rep(2,3) )

par(mfrow=c(1,1))
1
prior distribution density plot
2
posterior distribution density plot

Figure 14: Bayesian inference: grid approximation

Solution

As recommended by McElreath (2020), robust regression models can be used to deal with these types of distributional departures. Robust regression models are a general class of statistical procedures designed to reduce the sensitivity of the parameter estimates to mild or moderate failures in the assumption of a model (Everitt and Skrondal, 2010). The procedure consist on modifying the statistical models to include traits that effectively make them robust to small departures from the distributional assumption, like heteroscedastic errors, or to the presence of outliers.

Robust regression models

A general class of statistical procedures designed to reduce the sensitivity of the parameter estimates to mild or moderate failures in the assumption of a model (Everitt and Skrondal, 2010).

Importance

It is known that dealing with heteroscedasticity and the identification of outlier through preliminary univariate procedures is prone to the erroneous transformation or exclusion of valuable information. This can ultimately bias the parameter estimates, and even make them inefficient (McElreath, 2020). Bias refer to the extent to which the statistical method used in a study does not estimate the quantity thought to be estimated (Everitt and Skrondal, 2010).

Bias

It refer to the extent to which the statistical method used in a study does not estimate the quantity thought to be estimated (Everitt and Skrondal, 2010).

Dealing with the possibility of heteroscedasticity or outlying observations is relevant to the present study, because there is an interest in testing hypotheses about the potential intelligibility of speakers. Therefore, it is a necessity to considering the possibility of using robust regression models to assess these distributional departures and generate unbiased parameter estimates.

Research gaps

The process of decoding the words in a message, known as intelligibility, is crucial for understanding the intended meaning or purpose of a message. Speech intelligibility refers to the extent to which a listener can accurately recover the elements in an acoustic signal produced by a speaker, such as phonemes or words (Freeman, Pisoni, Kronenberger, and Castellanos, 2017; Heuven, 2008; Whitehill and Chau, 2004).

Speech intelligibility

The extent to which a listener can accurately recover the elements in an acoustic signal produced by a speaker, such as phonemes or words (Freeman et al., 2017; Heuven, 2008; Whitehill and Chau, 2004).

In the research of speech intelligibility, studies that utilized entropy scores have extensively explored the statistical modeling of data clustering. For instance, Boonen et al. (2021) addressed the data clustering of entropy by employing LMM (refer to Section 3.3.1). However, to the best of the authors’ knowledge, no prior research has specifically addressed the modeling of the bounded nature of the entropy data. Section 3.2.3 and Section 3.3.3 present compelling arguments on the critical importance of considering both in the statistical modeling of data.

Furthermore, no study has attempted to construct a latent variable that unveils the speaker’s (underlying) potential intelligibility. Section 3.4.3 elaborates on the importance of consider measurement error in statistical models, and the role of latent variables to accomplish such task.

Hence, this study aims to fill these gaps by introducing the Bayesian beta-proportion linear, latent and mixed model, a type of generalized linear latent and mixed model (GLLAMM) (Rabe-Hesketh et al., 2004a, 2004b, 2004c; Skrondal and Rabe-Hesketh, 2004). Section 3.1.6 elaborates on the relevance of Bayesian inference procedures for this study.

Research questions

This study aims to address three research questions that provide insights into modeling speech intelligibility using clustered and bounded entropy data.

Research question 1: Does the Bayesian beta-proportion linear, latent and mixed model accounts for all the observed variability in the data and yield better predictions?. More specifically, the study seeks to determine whether the model can improve the predictions of entropy data at different clustering levels (i.e., word, sentence, and speaker level) compared to the ubiquitous normal distributed model.

Research question 2: Can a measure of a speaker’s potential intelligibility be constructed?. In this regard, the study will leverage the principles laid down in Section 3.4.3, to aggregate listener assessment in the form of (manifest) entropy scores into a latent variable that might unveil the speaker’s (latent) potential intelligibility.

Research question 3: Can the speaker’s potential intelligibility be used to test specific research hypotheses?. Particularly, the study aims to demonstrate how the new analytic paradigm could be employed to examine whether child-related factors, such as children’s chronological age and hearing status, contribute to intelligibility.

Data

The data comprised the transcriptions of children’s spontaneous speech samples originally collected by Boonen et al. (2021). The analysis code is illustrated with the dataset, enabling readers to verify the correctness of the implementation. However, the data is not publicly available due privacy restrictions. Nonetheless, the data can provided by the corresponding author upon reasonable request.

Transcription task

For the transcription task, 105 listeners and 320 sentences, with a maximum of 11 words per sentence (speech samples), were randomly assigned to five blocks. Each block consisted of approximately 21 listeners who transcribed 64 sentences presented in a random order. As a result, a total of 47,514 transcribed words were generated from the original 2,263 words present in the speech samples.

Speech samples

Sentences with a minimum of 3 and a maximum of 11 words per sentence.

Entropy calculation

The 47,514 orthographic transcriptions were automatically aligned with a python script at the sentence level, in a column-like grid structure similar to the one presented in Table 1. This alignment process was repeated for each sentence within each speaker and block, and the output was manually checked and adjusted (if needed) in order to appropriately align the words. For more details on the alignment procedure refer to Boonen et al. (2021).

Next, the aligned transcriptions were aggregated by listener, yielding 2,263 entropy scores, one score per word. The entropy scores were calculated following Shannon’s entropy formula (1948):

Entropy formula

\begin{equation} H_{wsib} = H( \boldsymbol{p} ) = \frac{ \sum_{k=1}^{K} p_{k} \cdot log_{2}(p_{k}) }{ log_{2}(J)} \end{equation} \tag{4}

where H_{wsib} denotes the entropy score bounded within the continuous interval between zero and one \left( H_{wsib} \in [0,1] \right), with w denoting the word number, s the sentence number, i the speaker number, and b the block number. Moreover, K describes the number of different word types within transcriptions, p_{k} denotes the proportion of word types within transcriptions, and J defines the total number of word transcriptions. Notice that by design, the total number of word transcriptions J corresponds with the number of listeners per block, i.e., 21 listeners. Furthermore, p_{k} is calculated as follows:

\begin{equation} p_{k} = \frac{ \sum_{j=1}^{J} 1(T_{jk}) }{ J } \end{equation} \tag{5}

where 1(T_{jk}) denotes an indicator function that takes the value of one when the word type k is present in the transcription j.

The resulting entropy scores served as the outcome variable, capturing the agreement or disagreement among listeners’ word transcriptions. High degree of agreement between transcriptions indicate higher intelligibility, and yield lower entropy scores. Low degree of agreement between transcriptions indicate lower intelligibility, and yield higher entropy scores. (Boonen et al., 2021; Faes, De Maeyer, and Gillis, 2021).

Entropy interpretation

High degree of agreement between transcriptions indicate higher intelligibility, and yield lower entropy scores. Low degree of agreement between transcriptions indicate lower intelligibility, and yield higher entropy scores. (Boonen et al., 2021; Faes et al., 2021)

Table 1: Hypothetical alignment of word transcriptions and entropy score calculations for the first sentence, produced by the first speaker assigned to the first block, and transcribed by five listeners \left( s=1, i=1, b=1, J=5 \right). [B] represent a blank space, and [X] an unidentifiable speech. Extracted from Boonen et al. (2021), and slightly modified for illustrative purposes.
Transcription Words
Number 1 2 3 4 5
1 de jongen ziet een kikker
the boy sees a frog
2 de jongen ziet de [X]
the boy sees the [X]
3 de jongen zag [B] kokkin
the boy saw [B] cook
4 de jongen zag geen kikkers
the boy saw no frogs
5 de hond zoekt een [X]
the dog searches a [X]
Entropy 0 0.3109 0.6555 0.8277 1

It is relevant to exemplify the entropy calculation procedure. For that purpose, the words in position two, four and five observed in Table 1 were used. These words were assumed present in the first sentence, produced by the first speaker assigned to the first block, and transcribed by five listeners (w=\{2,4,5\}, s=1, i=1, b=1, J=5).

For the second word, the first four listeners identified the word type jongen (T_{j1}), while the last identified the word type hond (T_{j2}). Therefore, two word types were identified (K=2), with proportions equal to \{ p_{1}, p_{2} \} = \{ 4/5, 1/5 \} = \{ 0.8, 0.2 \}, and entropy score equal to:

H_{2111} = \frac{ 0.8 \cdot log_{2}(0.8) + 0.2 \cdot log_{2}(0.2) }{ log_{2}(5)} \approx 0.3109 For the fourth word, two listeners identified the word type een (T_{j1}), one listener the word type de (T_{j2}), and another the word geen (T_{j3}). It is important to highlight the presence of a blank space [B] in transcription number three. A blank space [B] is a symbol that defines the absence of a word in a space were a word was expected, as compared with other transcriptions. Notice that for calculation purposes, because the blank space was not expected in position three it was considered as a different word type (T_{j4}). Therefore four word types were registered (K=4), with proportions equal to \{ p_{1}, p_{2}, p_{3}, p_{4} \} = \{ 2/5, 1/5, 1/5, 1/5 \} = \{ 0.4, 0.2, 0.2, 0.2 \} and entropy score equal to:

H_{4111} = \frac{ 0.4 \cdot log_{2}(0.4) + 3 \cdot 0.2 \cdot log_{2}(0.2) }{ log_{2}(5)} \approx 0.8277 Lastly, for the fifth word, each listener transcribed a different word. It is important to highlight that when a listener did not managed to identify a complete word, or part of it, (s)he was instructed to write [X] in that position. However, for the calculation of the entropy score, if more than one listener marked an unidentifiable word with [X], each one of them was considered a different word type. This was done to avoid the artificial reduction of the entropy score, as [X] values already indicate the lack of the word’s intelligibility. Consequently, five word types were observed, T_{j1}=kikker, T_{j2}=[X], T_{j3}=kokkin, T_{j4}=kikkers, T_{j5}=[X] (K=5), with proportions equal to \{ p_{1}, p_{2}, p_{3}, p_{4}, p_{5} \} = \{ 1/5, 1/5, 1/5, 1/5, 1/5 \} = \{ 0.2, 0.2, 0.2, 0.2, 0.2 \}, and entropy score equal to:

H_{5111} = \frac{ 5 \cdot 0.2 \cdot log_{2}(0.2) }{ log_{2}(5)} = 1

Exploring the data

As expected, the data exploration reveals two significant features of the entropy scores: clustering and boundedness (refer to Section 3.2.3 and Section 3.3.3). In the case of the entropy scores, clustering arises due to the presence of various word-level scores generated for numerous sentences, originated from different speakers and evaluated in different blocks (see code output below, depicting the first ten observations of the data). On the other hand, entropy scores exhibit boundedness as they can only take on values within the continuous interval between zero and one, particularly H_{wsib} \in [0,1] (see Figure 15 showing three randomly selected children).

Code
var_int = c('block','childID','sentence','word','HS','A','Am','Hwsib')

head( data_H[, var_int], 10 )
1
selecting variables of interest
2
showing the first 10 observations of the data
   block childID sentence word HS  A Am      Hwsib
1      1       1        1    1  2 85 17 0.05703883
2      1       1        1    2  2 85 17 0.27857304
3      1       1        1    3  2 85 17 0.27857304
4      1       1        1    4  2 85 17 0.46073935
5      1       1        1    5  2 85 17 0.11344714
6      1       1        1    6  2 85 17 0.00010000
7      1       1        1    7  2 85 17 0.00010000
8      1       1        1    8  2 85 17 0.00010000
9      2       1        2    1  2 85 17 0.06626602
10     2       1        2    2  2 85 17 0.06626602
Code
require(rethinking)

set.seed( 12345 )
children = sample( 1:32, size=3, replace=F ) 

par(mfrow=c(1,3))

for( i in children ){
  
  idx_child = data_H$childID == i
  
  hist( data_H$Hwsib[idx_child], 
        xlab='', xlim=c(-0.2,1.2), breaks=15, 
        col=rgb(0,0,0,alpha=0.2),
        main=paste0('Child ', i, ', all sentences') )
  abline( v=c(0,1), col='gray', lty=2 )
  
}

par(mfrow=c(1,1))
1
package requirement
2
seed for replication and random sample of three children
3
histogram plot for all sentences of childID

Figure 15: Entropy scores distribution: all sentences of selected children

Additionally, the data shows the 320 children’s speech samples consists of sentences with a minimum of 3 and a maximum of 11 words per sentence (\mu=7.07, \sigma=1.06), where most of the speech samples have between 5 and 9 words per sentence (see Figure 16).

Code
speech_samples = with(data_H, table(childID, sentence) )
speech_samples
1
report speech samples per child and sentence
       sentence
childID  1  2  3  4  5  6  7  8  9 10
     1   8  8  8  7  8  7  7  6  6  5
     2   9  7  7  7  6  6  6  6  5  7
     3   8  8  8  8  8  7  8  7  7  6
     4   8  8  8  7  7  7  7  7  6  6
     5   8  9  7  7  7  7  7  7  6  7
     6   8  6  3  6  6  6  6  5  4  4
     7   9  7  7  7  7  8  4  6  6  6
     8   8  8  8  9  7  7  6  6  6  6
     9   9 11  7  6  7  6  6  5  5  6
     10  9  8  8  9  8  8  7  6  6  6
     11 10  9  7  7  7  7  7  7  6  7
     12  9  8  7  8  8  8  6  7  6  6
     13  7  7  7  7  7  7  7  7  6  6
     14  8  8  8  7  8  6  6  6  5  6
     15  8  8  7  7  7  8  7  8  7  6
     16  8  7  7  7  7  6  6  6  6  6
     17  8  7  7  7  7  6  7  7  6  6
     18  8  8  8  7  7  7  6  6  6  6
     19  9  8  7  7  8  8  6  8  6  6
     20 10  8  8  7  7  7  7  6  6  6
     21  9  8  8  8  7  7  7  8  6  6
     22  9  8  8  7  7  7  7  7  8  7
     23  8  8  7  7  7  7  7  7  6  6
     24  7  7  7  7  7  7  7  8  7  7
     25  9  8  8  7  7  8  8  7  6  6
     26  7  7  7  9  7  7  7  7  7  7
     27  9  8  8  7  7  7  7  7  6  6
     28  9  8  8  8  7  7  7  7  6  6
     29  8  9  8  9  7  7  7  6  6  7
     30  9  7  7  7  7  7  6  6  5  5
     31 11  8  8  8  7  7  7  7  7  6
     32  8  8  8 10  7  8  6  6  6  6
Code
speech_samples = data.frame( speech_samples )
hist(speech_samples$Freq, breaks=20, xlim=c(2, 12),
     main='', xlab='words per sentence')
1
histogram of words per sentences

Figure 16: Histogram of words per sentences in the speech samples
Code
psych::describe( speech_samples$Freq )
1
statistical descriptors for the speech samples
   vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 320 7.07 1.06      7    7.04 1.48   3  11     8 0.19     1.45 0.06

Moreover, the data comprised 16 normal hearing children (NH, hearing status category 1) and 16 hearing impaired children, with cochlear implant (HI/CI, hearing status category 2). Most of the NH and HI/CI children had 82 and 85 months of age, respectively. Additionally, the minimum chronological age registered in the sample is 68 months.

Code
d_mom = unique( data_H[,c('childID','HS','A')])
with( d_mom, table( A, HS ) )
1
unique hearing status and chronological age per child
2
number of children per chronological age and hearing status
     HS
A     1 2
  68  0 1
  76  0 1
  78  2 0
  80  1 1
  82  5 0
  83  0 1
  84  0 1
  85  0 6
  86  2 1
  88  1 0
  93  2 2
  94  1 0
  97  1 0
  98  1 0
  104 0 2

After examining the data, it is emphasized that no observations were excluded from the modeling process based on preliminary or univariate procedures. As it is detailed in Section 3.5, the identification of outliers, through univariate procedures is prone to the erroneous exclusion of valuable information, which can ultimately bias the parameter estimates. Consequently, their identification was performed within the context of the proposed models, as recommended by McElreath (2020).

Lastly, before fitting the models using Bayesian inference, the data was formatted as a list including all necessary information for the fitting process:

Code
dlist = list(
  
  N = nrow(data_H),
  B = max(data_H$block),
  I = max(data_H$childID),
  U = max(data_H$sentence),
  W = max(data_H$word),
  
  cHS = max(data_H$hearing_status),
  
  bid = data_H$block,
  cid = data_H$childID,
  uid = data_H$sentence,
  HS = data_H$hearing_status,
  Am = with( data_H, chronological_age - min(chronological_age) ),
  Hwsib = data_H$Hwsib
  
)

str(dlist)
1
Number of observations
2
Maximum number of blocks
3
Maximum number of children
4
Maximum number of sentences
5
Maximum number of words
6
Maximum number of categories in hearing status
7
Observation block ID
8
Observation child ID
9
Observation sentence ID
10
Observation hearing status
11
Observation chronological age
12
Observation entropy score
List of 12
 $ N    : int 2263
 $ B    : int 5
 $ I    : int 32
 $ U    : int 10
 $ W    : int 11
 $ cHS  : int 2
 $ bid  : int [1:2263] 1 1 1 1 1 1 1 1 2 2 ...
 $ cid  : int [1:2263] 1 1 1 1 1 1 1 1 1 1 ...
 $ uid  : int [1:2263] 1 1 1 1 1 1 1 1 2 2 ...
 $ HS   : int [1:2263] 2 2 2 2 2 2 2 2 2 2 ...
 $ Am   : int [1:2263] 17 17 17 17 17 17 17 17 17 17 ...
 $ Hwsib: num [1:2263] 0.057 0.279 0.279 0.461 0.113 ...

Methods

This subsection presents the probabilistic formalism for the statistical models utilized in the current study. It also details the estimation procedure employed to fit the models, and the criteria used to asses the quality of the Bayesian inference results. Lastly, it presents the set of fitted models, and the methodology for selecting among them.

The Bayesian inference procedures are elaborated following closely the When-to-Worry-and-How-to-Avoid-the-Misuse-of-Bayesian-Statistics checklist (WAMBS checklist) (Depaoli and Schoot, 2017). The WAMBS checklist is a questionnaire designed to outline the ten main points that should be meticulously examined when employing Bayesian inference procedures, with the ultimate goal of enhancing the transparency and replicability of the analysis.

WAMBS checklist

Questionnaire designed to outline the ten main points that should be meticulously examined when employing Bayesian inference procedures, with the ultimate goal of enhancing the transparency and replicability of the analysis (Depaoli and Schoot, 2017).

Models

Normal GLLAMM

A generalized linear, latent and mixed model (GLLAMM) (Rabe-Hesketh et al., 2004a, 2004b, 2004c; Skrondal and Rabe-Hesketh, 2004) is a unifying framework used to estimate (non)linear relationship between the mean of (non)normally distributed manifest variables with clustered observations, while also estimating relationships between latent variables and one or more covariates. Under this framework, the normal linear, latent and mixed model is composed of two parts: a response and a structural equation part. The probabilistic representation of the response part can be stated as follows:

Generalized linear, latent and mixed model (GLLAMM)

Unifying framework used to estimate (non)linear relationship between the mean of (non)normally distributed manifest variables with clustered observations, while also estimating relationships between latent variables and one or more covariates (Rabe-Hesketh et al., 2004a, 2004b, 2004c; Skrondal and Rabe-Hesketh, 2004).

\begin{align} H_{wsib} & \sim \text{Normal} \left( \mu_{ib}, \sigma_{i} \right) \\ \mu_{ib} &= a_{b} - SI_{i} \end{align} \tag{6}

where H_{wsib} denotes the manifest word-level entropy scores, for sentence s and child i in block b. Furthermore, the \mu_{ib} denotes the mean of the outcome and \sigma_{i} its standard deviation. Additionally, a_{b} denotes the block random effect, and SI_{i} denotes the speaker’s (latent) potential intelligibility score. On the other hand, the structural equation part, which relates the children’s potential intelligibility to the child related factors, can be represented probabilistically in the following manner:

\begin{align} SI_{i} = \alpha + \alpha_{HS[i]} + \beta_{A, HS[i]} (A_{i} - \bar{A}) + e_{i} + u_{i} \end{align} \tag{7}

where HS_{i} defines the hearing status and A_{i} the chronological age in months, of children i. \alpha defines the general intercept, and \alpha_{HS[i]} denotes the potential intelligibility for different hearing status groups. Moreover, \beta_{A,HS[i]} denotes the evolution of potential intelligibility per unit of chronological age for each hearing status group. Lastly, e_{i} denotes the unexplained potential intelligibility differences between children, and u_{i} the average unexplained potential intelligibility variability among sentences for each children, defined as u_{i} = \sum_{s=1}^{S} u_{si}/S with S denoting the total number of sentences per child.

Five features can be noticed in the previous probabilistic representations. First, the full definition of the model shows four of the five components of a Bayesian GLLAMM: the response model, with its likelihood component, linear predictor and link function; and the structural equation part. The fifth component, priors and hyperpriors, is fully developed in Section 7.2. Equation 6 reveals that for the likelihood component of the response model, the outcome is assumed to be normally distributed with mean \mu_{ib} and standard deviation \sigma_{i}. Moreover, The linear predictor of the response model is a linear combination of the parameters a_{b} and SI_{i} with a direct link function, implying the absence of a transformation of the linear predictor. Lastly, Equation 7 shows that the structural equation part relates the potential intelligibility with the child related factors.

GLLAMM components

  1. Response model, random component
  2. Response model, linear predictor
  3. Response model, link function
  4. Structural equation model
  5. Priors and Hyperpriors

Second, Equation 6 reveals the response model representation can consider different standard deviation per child (indicated by the subscript i). These trait effectively make the regression models robust to small departures from the normal distributional assumption or to the presence of outliers (refer to Section 3.5).

Third, Equation 6 also shows that in the response model representation, the potential intelligibility has a negative linear relationship with the average entropy scores. This feature makes explicit the inverse relationship between the potential intelligibility and entropy scores detailed in Section 6.2.

Fourth, Equation 7 shows the chronological age is centered around a minimum age \bar{A}. The centering procedure is done to facilitate the interpretation of the parameters (Everitt and Skrondal, 2010). Specifically, to the centering is done avoid interpreting the parameter estimates outside the range of the chronological ages available in the data.

Centering

Procedure use to facilitate the interpretation of regression parameters (Everitt and Skrondal, 2010).

Fifth and last, Equation 7 shows the model assumes the potential intelligibility has two sources of unexplained variability. One from the specific selection of children e_{i}, and another from the specific selection of sentences for each children u_{i}. While e_{i} assumes that different children inherently have different levels of potential intelligibility, u_{i} assumes that different sentences measure differently the potential intelligibility of children. The latter is assumed to be due to the different word difficulties and their interplay among each other within the sentence.

Beta-proportion GLLAMM

In a similar fashion, the proposed beta-proportion linear, latent and mixed model was composed of two parts: a response and a structural part. The response part assumed the following probabilistic structure:

\begin{align} H_{wsib} & \sim \text{BetaProp} \left( \mu_{ib}, M_{i} \right) \\ \mu_{ib} &= \text{logit}^{-1}( a_{b} - SI_{i} ) \end{align} \tag{8}

On the other hand, the structural equation part is defined as in Equation 6, i.e., in the same way as in the normal linear, latent and mixed model.

Similar to the normal GLLAMM, the probabilistic representation of the beta-proportion GLLAMM has three main features. First, the representation also highlight four of the five components of a GLLAMM. For the likelihood component of the response model, it is assumed the outcome is beta-proportional distributed with mean \mu_{ib} and sample size M_{i}. The linear predictor of the response model is a linear combination of the parameters a_{b} and SI_{i}, with an inverse-logit link function \text{logit}^{-1}(x) = exp(x) / (1+exp(x)). Lastly, the structural equation part also relates the children’s potential intelligibility with the child related factors.

Second, Equation 8 reveals the response model representation also considers different sample sizes per child (indicated by the subscript in M_{i}). As in the normal GLLAMM, models with this trait were dubbed robust regression models (refer to Section 3.5).

Third and last, equation Equation 8 shows in the response model representation, the children’s potential intelligibility has a negative (non)linear relationship with the entropy scores. This feature still makes explicit the inverse relationship between the children’s potential intelligibility and entropy scores, as seen in Section 6.2.

Prior distributions

In this section, prior predictive simulations are conducted for all parameters that necessitate prior assumptions. Prior predictive simulation is a procedure employed to generate data based on the prior distributions. The purpose of the procedure is to assign sensible priors and comprehend their implications in the context of a model, before integrating any information derived from empirical data (McElreath, 2020).

Prior predictive simulations

Procedure employed to generate data based on the prior distributions. The purpose of the procedure is to assign sensible priors and comprehend their implications in the context of a model, before integrating any information derived from empirical data (McElreath, 2020).

The procedure involved simulating parameters independently in either the structural or response part of the models. These parameters were later transformed into the entropy score scale through the response part. The specific steps employed are found in the adjacent code within the parameter’s corresponding sections 7.

The probabilistic representations for the normal GLLAMM show that seven parameters require assumption of a prior distribution (refer to Section 7.1.1). Similarly, the representation for the beta-proportion GLLAMM requires an additional seven prior distribution assumptions (refer to Section 7.1.2).

The parameters found in the structural part of both models can be enumerated as follows:

  1. the general intercept \alpha,
  2. the hearing status effects \alpha_{HS[i]},
  3. the chronological age per hearing status effects \beta_{A,HS[i]},
  4. the child variability e_{i},
  5. the child-sentence average variability u_{i}

Furthermore, the parameters present in the response part of either model are:

  1. the random block effect a_{b}, present in both models,
  2. the standard deviation \sigma_{i}, present only in the normal GLLAMMM, and
  3. the sample size M_{i}, present only in the beta-proportion GLLAMM,

Lastly, after the establishment of all the prior distributions, a prior predictive simulations are also provided for the children potential intelligibility SI_{i}.

Intercepts \alpha

The parameter’s prior distribution for the normal and beta-proportion GLLAMM is described in Equation 9. The distribution serves as an informative prior, which aims to set the scale for the speaker’s (latent) potential intelligibility, a requirement often seen in latent variable models (Depaoli, 2021) (refer to Section 3.1.4).

\begin{align} \alpha &\sim \text{Normal} \left( 0, 0.05 \right) \end{align} \tag{9}

The left panel of Figure 18 shows the prior restricts \alpha to be mostly between the range of [-0.1, 0.1]. The implications for the scale of potential intelligibility are: (1) no particular bias towards a positive or negative intercept is evident, and (2) the scale is initially centered around zero with high certainty.

On the other hand, the right panel of Figure 17 demonstrate that when the \alpha prior is transformed to the entropy scale under the normal GLLAMM model, the model anticipates a concentration of data around lower levels of entropy, and beyond the feasible range of the outcome. This prior might seem inappropriate at first glance, but other priors hide even more undesirable assumption.

Code
require(rethinking)

n = 1000

param_pscale = rnorm( n=n, mean=0, sd=0.05 )
param_oscale = -1*param_pscale

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-1,1),
      show.HPDI=0.95,
      main='Intelligibility scale',
      xlab=expression(alpha))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))
1
package requirement
2
simulated sample size
3
Normal GLLAMM general intercept prior distribution definition: intelligibility scale
4
Normal GLLAMM general intercept prior distribution definition: entropy scale
5
Normal GLLAMM general intercept prior distribution density plot: unrestricted intelligibility scale
6
Normal GLLAMM general intercept prior distribution density plot: bounded entropy scale

Figure 17: Normal GLLAMM, general intercept prior distribution: intelligibility and entropy scale

In contrast, the right panel of Figure 18, demonstrate that when the \alpha prior is transformed to the entropy scale under the beta-proportion GLLAMM, the model anticipates a concentration of data around mid levels of entropy, and not beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative entropy scores among the hearing status groups are expected by the model.

Code
require(rethinking)

n = 1000

param_pscale = rnorm( n=n, mean=0, sd=0.05 )
param_oscale = inv_logit( -1*param_pscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-1,1),
      show.HPDI=0.95,
      main='Intelligibility scale', 
      xlab=expression(alpha))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))
1
package requirement
2
simulated sample size
3
Beta-proportion GLLAMM general intercept prior distribution definition: intelligibility scale
4
Beta-proportion GLLAMM general intercept prior distribution definition: entropy scale
5
Beta-proportion GLLAMM general intercept prior distribution density plot: unrestricted intelligibility scale
6
Beta-proportion GLLAMM general intercept prior distribution density plot: bounded entropy scale

Figure 18: Beta-proportion GLLAMM, general intercept prior distribution: intelligibility and entropy scale

Hearing status effects \alpha_{HS[i]}

The prior distribution of the parameters for the normal and beta-proportion are described in Equation 10. For each of the two hearing status categories present in the data there is one prior (refer to Section 6.3). Notably, the same prior is applied to both categories. This choice implies that the parameters for each category are presumed to have similar uncertainties prior to the observation of empirical data.

\begin{align} \alpha_{HS[i]} &\sim \text{Normal} \left( 0, 0.3 \right) \end{align} \tag{10}

The left panel of Figure 19 reveal a weakly informative prior that restricts the range of probability of \alpha_{HS[i]} only between [-0.6, 0.6] (refer to Section 3.1.4). This implies that no particular bias towards positive or negative values for the potential intelligibility of different hearing status groups is present in the priors.

However, the right panel of Figure 19 demonstrate that when the prior of \alpha_{HS[i]} is transformed to the entropy scale, the model anticipates a concentration of data around lower levels of entropy, and outside the feasible range of the outcome. This prior might seem inappropriate at first glance, but other priors hide even more undesirable assumption.

Code
require(rethinking)

n = 1000

param_pscale = rnorm( n=n, mean=0, sd=0.3 )
param_oscale = -1*param_pscale

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-1,1),
      show.HPDI=0.95,
      main='Intelligibility scale', 
      xlab=expression(alpha[HS]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))
1
package requirement
2
simulated sample size
3
Normal GLLAMM hearing status effects prior distribution definition: intelligibility scale
4
Normal GLLAMM hearing status effects prior distribution definition: entropy scale
5
Normal GLLAMM hearing status effects prior distribution density plot: unrestricted intelligibility scale
6
Normal GLLAMM hearing status effects prior distribution density plot: bounded entropy scale

Figure 19: Normal GLLAMM, hearing status effects prior distribution: intelligibility and entropy scale

Conversely, the right panel of Figure 20, demonstrate that when the \alpha_{HS[i]} prior is transformed to the entropy scale under the beta-proportion GLLAMM, the model anticipates a concentration of data around mid levels of entropy, and not beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative entropy scores are expected by the model.

Code
require(rethinking)

n = 1000

param_pscale = rnorm( n=n, mean=0, sd=0.5 )
param_oscale = inv_logit( -1*param_pscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-1,1),
      show.HPDI=0.95,
      main='Intelligibility scale', 
      xlab=expression(alpha[HS]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))
1
package requirement
2
simulated sample size
3
Beta-proportion GLLAMM hearing status effects prior distribution definition: intelligibility scale
4
Beta-proportion GLLAMM hearing status effects prior distribution definition: entropy scale
5
Beta-proportion GLLAMM hearing status effects prior distribution density plot: unrestricted intelligibility scale
6
Beta-proportion GLLAMM hearing status effects prior distribution density plot: bounded entropy scale

Figure 20: Beta-proportion GLLAMM, hearing status effects prior distribution: intelligibility and entropy scale

Chronological age per hearing status \beta_{A,HS[i]}

The prior distribution of \beta_{A,HS[i]} for the normal and beta-proportion GLLAMM is described in Equation 11. For each of the two hearing status categories present in the data there is one prior (refer to Section 6.3). Notably, the same prior is applied to both categories. This choice implies that the evolution of potential intelligibility attributed to chronological age between the categories is presumed to have similar uncertainties prior to the observation of empirical data. Furthermore, even when the model did not consider the interaction of chronological age and hearing status (\beta_{A} instead of \beta_{A,HS[i]}), the same prior was utilized (refer to Section 7.6).

\begin{align} \beta_{A,HS[i]} &\sim \text{Normal} \left( 0, 0.1 \right) \end{align} \tag{11}

The left panel of Figure 21 shows a weakly informative prior that restricts \beta_{A,HS[i]} to be within the range of [-0.2, 0.2]. This implies that no particular bias towards positive or negative effects is present for the evolution of potential intelligibility due to chronological age per hearing status group.

The right panel of Figure 21, on the other hand, demonstrate that when the prior of \beta_{A,HS[i]} is transformed to the entropy scale, the model anticipates a concentration of lower entropy values. More importantly, the model expects entropy scores beyond the feasible range of the outcome. This assumption may initially appear inappropriate, however, considering again the probabilistic representation of the model, other priors hide even more undesirable assumptions.

Code
require(rethinking)

n = 1000

param_pscale = rnorm( n=n, mean=0, sd=0.1 )
usd = function(i){ param_pscale * data_H$Am[i] }
param_mscale = sapply( 1:length(data_H$Am), usd )
param_oscale = -1*param_mscale

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-1,1),
      show.HPDI=0.95,
      main='Intelligibility scale', 
      xlab=expression(beta[AHS]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))
1
package requirement
2
simulated sample size
3
Normal GLLAMM chronological age per hearing status effects prior distribution definition: intelligibility scale
4
user defined function: effects per chronological age
5
Normal GLLAMM chronological age per hearing status effects prior distribution calculation: intelligibility scale
6
Normal GLLAMM chronological age per hearing status effects prior distribution definition: entropy scale
7
Normal GLLAMM chronological age per hearing status effects prior distribution density plot: unrestricted intelligibility scale
8
Normal GLLAMM chronological age per hearing status effects prior distribution density plot: bounded entropy scale

Figure 21: Normal GLLAMM, chonological age per hearing status effects prior distribution: intelligibility and entropy scale

In contrast, the right panel of Figure 22, demonstrate that when the prior of \beta_{A,HS[i]} is transformed to the entropy scale, the model anticipates a slight concentration of data around mid levels of entropy, but more importantly, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative effects for the evolution of potential intelligibility per hearing status group is present.

Code
require(rethinking)

n = 1000

param_pscale = rnorm( n=n, mean=0, sd=0.1 )
usd = function(i){ param_pscale * data_H$Am[i] }
param_mscale = sapply( 1:length(data_H$Am), usd )
param_oscale = inv_logit( -1*param_mscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-1,1),
      show.HPDI=0.95,
      main='Intelligibility scale', 
      xlab=expression(beta[AHS]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))
1
package requirement
2
simulated sample size
3
Beta-proportion GLLAMM chronological age per hearing status effects prior distribution definition: intelligibility scale
4
user defined function: effects per chronological age
5
Beta-proportion GLLAMM chronological age per hearing status effects prior distribution calculation: intelligibility scale
6
Beta-proportion GLLAMM chronological age per hearing status effects prior distribution definition: entropy scale
7
Beta-proportion GLLAMM chronological age per hearing status effects prior distribution density plot: unrestricted intelligibility scale
8
Beta-proportion GLLAMM chronological age per hearing status effects prior distribution density plot: bounded entropy scale

Figure 22: Beta-proportion GLLAMM, chronological age per hearing status effects prior distribution: intelligibility and entropy scale

Child differences e_{i}

The parameter e_{i} was defined in terms of hyperparameters (refer to Section 3.1.5). Specifically, the prior distribution of e_{i} for the normal and beta-proportion GLLAMM is described in Equation 12. For each of the children present in the data there is one prior (refer to Section 6.3). Notably, the same prior is applied to all children with a common mean and standard deviation defined by the hyperparameters. This choice implies that potential intelligibility differences between children are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters.

\begin{align} m_{i} &\sim \text{Normal} \left( 0, 0.05 \right) \\ s_{i} &\sim \text{Exponential} \left( 2 \right) \\ e_{i} &\sim \text{Normal} \left( m_{i}, s_{i} \right) \end{align} \tag{12}

The left panel of Figure 23 shows a weakly informative prior that restricts the potential intelligibility differences to be in a range of [-1.5, 1.5]. This implies that differences in potential intelligibility between children can be as large as three units of measurement.

However, the right panel of Figure 23 demonstrate that when the prior of e_{i} is transformed to the entropy scale under the normal GLLAMM, the model anticipates again a concentration of data around lower levels of entropy, and even it expects the differences to go beyond the feasible range of the outcome. This assumption may initially appear inappropriate, however, considering again the probabilistic representation of the model, other priors hide even more undesirable assumptions.

Code
require(rethinking)

n = 1000

m_i = rnorm( n=n, mean=0, sd=0.05 )
s_i = rexp(n=n, rate=2 )
param_pscale = rnorm( n=n, mean=m_i, sd=s_i )
param_oscale = -1*param_pscale

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-2,2),
      show.HPDI=0.95,
      main='Intelligibility scale', 
      xlab=expression(e[i]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))
1
package requirement
2
simulated sample size
3
mean hyperparameter prior distribution definition
4
standard deviation hyperparameter prior distribution definition
5
Normal GLLAMM child differences prior distribution calculation: intelligibility scale
6
Normal GLLAMM child differences prior distribution definition: entropy scale
7
Normal GLLAMM child differences prior distribution density plot: unrestricted intelligibility scale
8
Normal GLLAMM child differences prior distribution density plot: bounded entropy scale

Figure 23: Normal GLLAMM, children differences prior distribution: intelligibility and entropy scale

Conversely, the right panel of Figure 24, demonstrate that when the prior of e_{i} is transformed to the entropy scale under the beta-proportion GLLAMM, the model anticipates a high concentration of data around mid levels of entropy. However, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative differences in potential intelligibility between children are expected at the entropy scale level.

Code
require(rethinking)

n = 1000

m_i = rnorm( n=n, mean=0, sd=0.05 )
s_i = rexp(n=n, rate=2 )
param_pscale = rnorm( n=n, mean=m_i, sd=s_i )
param_oscale = inv_logit( -1*param_pscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-2,2),
      show.HPDI=0.95,
      main='Intelligibility scale', 
      xlab=expression(e[i]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))
1
package requirement
2
simulated sample size
3
mean hyperparameter prior distribution definition
4
standard deviation hyperparameter prior distribution definition
5
Beta-proportion GLLAMM child differences prior distribution calculation: intelligibility scale
6
Beta-proportion GLLAMM child differences prior distribution definition: entropy scale
7
Beta-proportion GLLAMM child differences prior distribution density plot: unrestricted intelligibility scale
8
Beta-proportion GLLAMM child differences prior distribution density plot: bounded entropy scale

Figure 24: Beta-proportion GLLAMM, children differences prior distribution: intelligibility and entropy scale

Sentence-child average differences u_{i}

The parameter u_{i} was defined in terms of hyperparameters (refer to Section 3.1.5). Specifically, the prior distribution of u_{i} for the normal and beta-proportion GLLAMM is described in Equation 13. For each sentence within children present in the data there is one prior, which are later aggregated across all sentences S, as defined in Section 7.1.1. Notably, the same prior is applied to all children with a common mean and standard deviation defined by the hyperparameters. This choice implies that the average potential intelligibility differences among sentences within children are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters.

\begin{align} m_{u} &\sim \text{Normal} \left( 0, 0.05 \right) \\ s_{u} &\sim \text{Exponential} \left( 2 \right) \\ u_{si} &\sim \text{Normal} \left( m_{u}, s_{u} \right) \\ u_{i} &= \sum_{s=1}^{S} \frac{u_{si}}{S} \end{align} \tag{13}

The left panel of Figure 25 shows a weakly informative prior that restricts the u_{i} to be in a range between [-0.5, 0.5]. This implies that average differences in potential intelligibility among sentences within children can be as large as one units of measurement.

The right panel of Figure 25, in contrast, demonstrate that when the prior of u_{i} is transformed to the entropy scale, the model anticipates a concentration of data around lower levels of entropy. More importantly, the model expects the differences to go beyond the feasible range of the outcome. Again, this prior may initially appear inappropriate but other priors hide even more undesirable assumptions.

Code
require(rethinking)

n = 1000

m_i = rnorm( n=n, mean=0, sd=0.05 )
s_i = rexp(n=n, rate=2 )
param_pscale = replicate(
  n=n, expr=mean( rnorm( n=10, mean=m_i, sd=s_i ) ) ) 
param_oscale = -1*param_pscale

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-1,1),
      show.HPDI=0.95,
      main='Intelligibility scale', 
      xlab=expression(u[i]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))
1
package requirement
2
simulated sample size
3
mean hyperparameter prior distribution definition
4
standard deviation hyperparameter prior distribution definition
5
Normal GLLAMM sentence-child average differences prior distribution calculation: intelligibility scale
6
Normal GLLAMM sentence-child average differences prior distribution definition: entropy scale
7
Normal GLLAMM sentence-child average differences prior distribution density plot: unrestricted intelligibility scale
8
Normal GLLAMM sentence-child average differences prior distribution density plot: bounded entropy scale

Figure 25: Normal GLLAMM, sentence-children average differences prior distribution: intelligibility and entropy scale

, Conversely, the right panel of Figure 26, demonstrate that when the prior of u_{i} is transformed to the entropy scale under the beta-proportion GLLAMM, the model anticipates a high concentration of data around mid levels of entropy. However, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative differences between children are expected at the entropy scale level.

Code
require(rethinking)

n = 1000

m_i = rnorm( n=n, mean=0, sd=0.05 )
s_i = rexp(n=n, rate=2 )
param_pscale = replicate(
  n=n, expr=mean( rnorm( n=10, mean=m_i, sd=s_i ) ) ) 
param_oscale = inv_logit( -1*param_pscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-1,1),
      show.HPDI=0.95,
      main='Intelligibility scale', 
      xlab=expression(u[i]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))
1
package requirement
2
simulated sample size
3
mean hyperparameter prior distribution definition
4
standard deviation hyperparameter prior distribution definition
5
Beta-proportion GLLAMM sentence-child average differences prior distribution calculation: intelligibility scale
6
Beta-proportion GLLAMM sentence-child average differences prior distribution definition: entropy scale
7
Beta-proportion GLLAMM sentence-child average differences prior distribution density plot: unrestricted intelligibility scale
8
Beta-proportion GLLAMM sentence-child average differences prior distribution density plot: bounded entropy scale

Figure 26: Beta-proportion GLLAMM, sentence-children average differences prior distribution: intelligibility and entropy scale

Random block effect a_{b}

The parameter a_{b} was also defined in terms of hyperparameters (refer to Section 3.1.5). Specifically, the prior distribution of a_{b} for the normal and beta-proportion GLLAMM is described in Equation 14. For each block present in the data there is one prior (refer to Section 6.3). The same prior is applied to all blocks with a common mean and standard deviation defined by the hyperparameters. This choice implies that block differences are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters.

\begin{align} m_{b} &\sim \text{Normal} \left( 0, 0.05 \right) \\ s_{b} &\sim \text{Exponential} \left( 2 \right) \\ a_{b} &\sim \text{Normal} \left( m_{b}, s_{b} \right) \end{align} \tag{14}

The left panel of Figure 27 shows a weakly informative prior restricting a_{b} to be in a range between [-1.5, 1.5], implying no particular bias towards positive or negative differences between blocks are present in the priors.

Nevertheless, the right panel of Figure 27 demonstrate that when the prior of a_{b} is transformed to the entropy scale under the normal GLLAMM, the model anticipates a concentration of data around lower levels of entropy. Furthermore, the model expects the differences to go beyond the feasible range of the outcome.

Code
require(rethinking)

n = 1000

m_i = rnorm( n=n, mean=0, sd=0.05 )
s_i = rexp(n=n, rate=2 )
param_pscale = rnorm( n=n, mean=m_i, sd=s_i )
param_oscale = -1*param_pscale

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-2,2),
      show.HPDI=0.95,
      main='Intelligibility scale', 
      xlab=expression(u[si]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))
1
package requirement
2
simulated sample size
3
mean hyperparameter prior distribution definition
4
standard deviation hyperparameter prior distribution definition
5
Normal GLLAMM block differences prior distribution calculation: intelligibility scale
6
Normal GLLAMM block differences prior distribution definition: entropy scale
7
Normal GLLAMM block differences prior distribution density plot: unrestricted intelligibility scale
8
Normal GLLAMM block differences prior distribution density plot: bounded entropy scale

Figure 27: Normal GLLAMM, block differences prior distribution: intelligibility and entropy scale

In contrast, the right panel of Figure 28, demonstrate that when the prior of a_{b} is transformed to the entropy scale under the beta-proportion GLLAMM, the model anticipates a high concentration of data around mid levels of entropy, but more importantly, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative differences between blocks are expected at the entropy scale level.

Code
require(rethinking)

n = 1000

m_i = rnorm( n=n, mean=0, sd=0.05 )
s_i = rexp(n=n, rate=2 )
param_pscale = rnorm( n=n, mean=m_i, sd=s_i )
param_oscale = inv_logit( -1*param_pscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-2,2),
      show.HPDI=0.95,
      main='Intelligibility scale', 
      xlab=expression(u[si]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))
1
package requirement
2
simulated sample size
3
mean hyperparameter prior distribution definition
4
standard deviation hyperparameter prior distribution definition
5
Beta-proportion GLLAMM block differences prior distribution calculation: intelligibility scale
6
Beta-proportion GLLAMM block differences prior distribution definition: entropy scale
7
Beta-proportion GLLAMM block differences prior distribution density plot: unrestricted intelligibility scale
8
Beta-proportion GLLAMM block differences prior distribution density plot: bounded entropy scale

Figure 28: Beta-proportion GLLAMM, block differences prior distribution: intelligibility and entropy scale

Standard deviation \sigma_{i}

The prior distribution of \sigma_{i} for the normal GLLAMM is described in Equation 15. For each of the children present in the data there is one prior (refer to Section 6.3). Notably, the same prior is applied to all \sigma_{i}. This choice implies that the parameters for the unexplained word-level entropy variability of each child are presumed to have similar uncertainties prior to the observation of empirical data. Furthermore, even when the model did not consider this robust feature, and estimated one common variability for all children (\sigma instead of \sigma_{i}), the same prior was utilized (see Section 7.1.1 and Section 7.6).

\begin{align} \sigma_{i} &\sim \text{Exponential}\left( 2 \right) \end{align} \tag{15}

The left panel of Figure 29 shows the weakly informative prior expects \sigma_{i} to be possible only in a positive range and more likely between [0, 1.5], as it is required for variability parameters (Depaoli, 2021). However, the right panel of Figure 29 shows that when the prior of \sigma_{i} is transformed to the entropy scale, the model expect that certain scores fall beyond the feasible range of the outcome. This assumption may initially appear inappropriate. However, considering again the probabilistic representation of the model, other priors under the normal GLLAMM hide even more undesirable assumptions.

Code
require(rethinking)

n = 1000

param_pscale = rexp( n=n, rate=2 )
param_oscale = rnorm( n=n, mean=0, sd=param_pscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(0,4),
      show.HPDI=0.95,
      main='Parameter scale', 
      xlab=expression(sigma[i]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))
1
package requirement
2
simulated sample size
3
Normal GLLAMM word-level entropy variability prior distribution definition: parameter scale
4
Normal GLLAMM word-level entropy variability prior distribution definition: entropy scale
5
Normal GLLAMM word-level entropy variability prior distribution density plot: restricted parameter scale
6
Normal GLLAMM word-level entropy variability distribution density plot: bounded entropy scale

Figure 29: Normal GLLAMM, word-level entropy unexplained variability prior distribution: parameter and entropy scale

Sample size M_{i}

The prior distribution of M_{i} for the beta-proportion GLLAMM is described in Equation 16. Similar to the previous parameter, for each of the children present in the data there is one prior (refer to Section 6.3). Notably, the same prior is applied to all M_{i}. This choice implies that the parameters for the unexplained word-level entropy variability of each child are presumed to have similar uncertainties prior to the observation of empirical data. Furthermore, even when the model did not consider this robust feature and estimated one common variability for all children (M instead of M_{i}), the same prior was utilized. (see Section 7.1.2 and Section 7.6).

\begin{align} M_{i} &\sim \text{Exponential}\left( 0.5 \right) \end{align} \tag{16}

The left and right panel of Figure 30, demonstrate the prior of M_{i} expects the parameters to be in a reasonable positive range between [0, 5], and within the boundaries of the entropy scores. This implies that no particular bias is present in the word-level entropy unexplained variability, only that it is positive, as expected for measures of variability.

Code
require(rethinking)

n = 1000

param_pscale = rexp( n=n, rate=0.5 )
param_oscale = inv_logit( -1*param_pscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(0,10),
      show.HPDI=0.95,
      main='Parameter scale', 
      xlab=expression(M[i]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))
1
package requirement
2
simulated sample size distribution definition
3
Beta-proportion GLLAMM sample size variability prior distribution definition: parameter scale
4
Beta-proportion GLLAMM sample size variability prior distribution definition: entropy scale
5
Beta-proportion GLLAMM sample size variability prior distribution density plot: restricted parameter scale
6
Beta-proportion GLLAMM sample size variability distribution density plot: bounded entropy scale

Figure 30: Beta-proportion GLLAMM, word-level entropy unexplained variability prior distribution: intelligibility and entropy scale

Speech intelligibility SI_{i}

After the careful assessment of the prior implications for each parameter, the expected prior distribution for the potential intelligibility can be constructed. The prior predictive simulation for SI_{i} under the normal and beta-proportion GLLAMM can be described as in Equation 17:

\begin{align} \alpha &\sim \text{Normal} \left( 0, 0.05 \right) \\ \alpha_{HS[i]} &\sim \text{Normal} \left( 0, 0.3 \right) \\ \beta_{A,HS[i]} &\sim \text{Normal} \left( 0, 0.1 \right) \\ m &\sim \text{Normal} \left( 0, 0.05 \right) \\ s &\sim \text{Exponential} \left( 2 \right) \\ e_{i} &\sim \text{Normal} \left( m, s \right) \\ u_{si} &\sim \text{Normal} \left( m, s \right) \\ u_{i} &= \sum_{s=1}^{S} \frac{u_{si}}{S} \\ SI_{si} &= \alpha + \alpha_{HS[i]} + \beta_{A, HS[i]} (A_{i} - \bar{A}) + e_{i} + u_{i} \\ \end{align} \tag{17}

The left panel of Figure 32 shows the prior restricts the potential intelligibility of children to be mostly between [-6, 6], implying no particular bias towards positive or negative potential intelligibility is present in the priors.

However, the right panel of Figure 31, demonstrate that when the prior of SI_{i} is transformed to the entropy scale under the normal GLLAMM, the model anticipates prediction of entropy scores beyond its feasible range.

Code
require(rethinking)

n = 1000

a = rnorm( n=n, mean=0, sd=0.05 )
aHS = rnorm( n=n, mean=0, sd=3 )
bAHS = rnorm( n=n, mean=0, sd=0.1 )

m = rnorm( n=n, mean=0, sd=0.05 )
s = rexp(n=n, rate=2 )
e_i = rnorm( n=n, mean=m, sd=s )
u_i = replicate( 
  n=n, exp=mean( rnorm( n=10, mean=m, sd=s ) ) )

param_pscale = a + aHS + bAHS + e_i + u_i
param_oscale = -1*param_pscale

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-7,7),
      show.HPDI=0.95,
      main='Intelligibility scale', 
      xlab=expression(SI[i]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))
1
package requirement
2
simulated sample size
3
general intercept prior distribution definition
4
hearing status effects prior distribution definition
5
chronological age per hearing status effects prior distribution definition
6
mean hyperparameter prior distribution definition
7
standard deviation hyperparameter prior distribution definition
8
child differences prior distribution definition
9
sentence-child differences effects prior distribution definition
10
Normal GLLAMM potential intelligibility prior distribution calculation: intelligibility scale
11
Normal GLLAMM potential intelligibility prior distribution definition: entropy scale
12
Normal GLLAMM potential intelligibility prior distribution density plot: unrestricted intelligibility scale
13
Normal GLLAMM potential intelligibility prior distribution density plot: bounded entropy scale

Figure 31: Normal GLLAMM, potential intelligibility distribution: intelligibility and entropy scale

In contrast, the right panel of Figure 32, demonstrates that when the prior of SI_{i} is transformed to the entropy scale under the beta-proportion GLLAMM, the model anticipates a high concentration of data at the extreme levels of entropy, with a lower probability for scores around mid-levels. Furthermore, the model does not expect data beyond the feasible range of the outcome.

Code
require(rethinking)

n = 1000

a = rnorm( n=n, mean=0, sd=0.05 )
aHS = rnorm( n=n, mean=0, sd=3 )
bAHS = rnorm( n=n, mean=0, sd=0.1 )

m = rnorm( n=n, mean=0, sd=0.05 )
s = rexp(n=n, rate=2 )
e_i = rnorm( n=n, mean=m, sd=s )
u_i = replicate( 
  n=n, exp=mean( rnorm( n=10, mean=m, sd=s ) ) )

param_pscale = a + aHS + bAHS + e_i + u_i
param_oscale = inv_logit( -1*param_pscale )

par(mfrow=c(1,2))

dens( param_pscale, xlim=c(-7,7),
      show.HPDI=0.95,
      main='Intelligibility scale', 
      xlab=expression(SI[si]))

dens( param_oscale, xlim=c(0,1),
      show.HPDI=0.95,
      main='Entropy scale', 
      xlab=expression(H[wsib]) )
abline( v=c(0, 1), lty=2, col='gray')

par(mfrow=c(1,1))
1
package requirement
2
simulated sample size
3
general intercept prior distribution definition
4
hearing status effects prior distribution definition
5
chronological age per hearing status effects prior distribution definition
6
mean hyperparameter prior distribution definition
7
standard deviation hyperparameter prior distribution definition
8
child differences prior distribution definition
9
sentence-child differences effects prior distribution definition
10
Beta-proportion GLLAMM potential intelligibility prior distribution calculation: intelligibility scale
11
Beta-proportion GLLAMM potential intelligibility prior distribution definition: entropy scale
12
Beta-proportion GLLAMM potential intelligibility prior distribution density plot: unrestricted intelligibility scale
13
Beta-proportion GLLAMM potential intelligibility prior distribution density plot: bounded entropy scale

Figure 32: Beta-proportion GLLAMM, potential intelligibility distribution: intelligibility and entropy scale

Estimation

The models were estimated using the software R version 4.2.2 (R Core Team, 2015) and STAN version 2.26.1 (Stan Development Team., 2021). R was utilized to handle the data and produce the outputs for the models. STAN was utilized to access the Bayesian estimation procedure. The procedure implemented in STAN was the Hamiltonian Monte Carlo (HMC), specifically, the No-U-Turn Sampler (NUTS) (Hoffman and Gelman, 2014) 8.

Software

Four Markov chains were implemented for each parameter. Each chain had distinct starting values. Due to the selected estimation procedure (HMC-NUTS), no burn-in or thinning procedure was necessary. Nevertheless, the procedure did required a warm-up phase. Consequently, each chain was run for 4,000 iterations, where the first 2,000 served as a warm-up phase and the remaining 2,000 were considered effective samples from the posterior distribution.

Stationarity, convergence and mixing evaluation

To evaluate the Markov chains’ performance in terms of achieving stationarity, convergence, and good mixing, visual inspections of the trace, trace-rank, and autocorrelation plots (ACF) were conducted, as recommended by McElreath (2020). Furthermore, the assessment of convergence was also supported by the potential scale reduction factor statistics (\hat{R}), developed by Gelman and colleagues (2014). Convergence was considered to be achieved if \hat{R} \leq 1.05.

Posterior information evaluation

To assess whether the parameter’s posterior distribution has been generated with a sufficient number of uncorrelated sampling points, the posterior distribution density is inspected. The number of uncorrelated sampling points indicates the amount of information held by the posterior distribution. Additionally, this assessment is complemented by the effective sample size statistics (n_{\text{eff}}) developed by Gelman and colleagues (2014).

Fitted models

Twelve statistical models were derived from the general descriptions of the normal and beta-proportion linear, latent and mixed models outlined in Section 7.1.1 and Section 7.1.2. The models differed in the response likelihood component, link function, and the structural equation part. For a comprehensive overview of the parametrization for the entire set of model refer to Table 2.

Of particular interest is the comparison of models one and seven. The selected model resulting from this comparison aims to address Research question 1 and Research question 2. Furthermore, the comparison and selection among all competing models, with particular interest in models six, nine and twelve, collectively serve to address Research question 3.

Table 2: Parametrization of fitted models.
Response Robust Block Fixed effects
Model distribution model effects \beta_{HS[i]} \beta_{A} \beta_{A,HS[i]}
1 Normal No Yes No No No
2 Normal No Yes Yes Yes No
3 Normal No Yes Yes No Yes
4 Normal Yes Yes No No No
5 Normal Yes Yes Yes Yes No
6 Normal Yes Yes Yes No Yes
7 Beta-prop. No Yes No No No
8 Beta-prop. No Yes Yes Yes No
9 Beta-prop. No Yes Yes No Yes
10 Beta-prop. Yes Yes No No No
11 Beta-prop. Yes Yes Yes Yes No
12 Beta-prop. Yes Yes Yes No Yes

The following tabset panel shows the STAN code of all fitted model. The code show commentaries for each section of the model. Furthermore, the models are implemented using non-centered priors (refer to Section 3.1.5).

```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (children)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // child's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    real a;               // intercept
    // vector[cHS] aHS;      // HS effects
    // real bAm;             // Am effects
    // vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // child RE mean
    real<lower=0> s_i;    // child RE sd
    vector[I] z_i;        // non-centered child RE
    real m_u;             // child, utterance RE mean
    real<lower=0> s_u;    // child, utterance RE sd
    matrix[I,U] z_u;      // non-centered child, utterance RE
    
    // variability parameters
    real<lower=0> s_w;      // child, utterance, word sd (one overall)
    // vector<lower=0>[I] s_w; // child, utterance, word sd (one per child)
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // child random effects
    matrix[I,U] u_si;     // sentence random effects
    vector[I] u_i;        // sentence average random effects
    vector[I] SI;         // SI index
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered child RE
    u_si = m_u + s_u*z_u;  // non-centered utterance RE
    
    // intelligibility and average entropy
    for(i in 1:I){
      u_i[ i ] = mean( u_si[ i, ] );
    }
    
    for(n in 1:N){
      SI[ cid[n] ] = a + 
        // aHS[ HS[n] ] + 
        // bAm*Am[n] + 
        // bAmHS[ HS[n] ]*Am[n] + 
        e_i[ cid[n] ] +
        u_i[ cid[n] ];
      mu[n] = b_i[ bid[n] ] - SI[ cid[n] ];
    }

}
model{

    // fixed effects priors
    a ~ normal( 0 , 0.05 );
    // aHS ~ normal( 0 , 0.3 );
    // bAm ~ normal( 0 , 0.1 );
    // bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( -0.5 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( -0.5 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( -0.5 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    s_w ~ exponential( 2 );

    // likelihood
    for(n in 1:N){
      Hwsib[n] ~ normal( mu[n] , s_w );
      // Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
      // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
    }
    
}
"

model_nam = "model01.stan"
writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) ) 

```
```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (children)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // child's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    real a;               // intercept
    vector[cHS] aHS;      // HS effects
    real bAm;             // Am effects
    // vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // child RE mean
    real<lower=0> s_i;    // child RE sd
    vector[I] z_i;        // non-centered child RE
    real m_u;             // child, utterance RE mean
    real<lower=0> s_u;    // child, utterance RE sd
    matrix[I,U] z_u;      // non-centered child, utterance RE
    
    // variability parameters
    real<lower=0> s_w;      // child, utterance, word sd (one overall)
    // vector<lower=0>[I] s_w; // child, utterance, word sd (one per child)
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // child random effects
    matrix[I,U] u_si;     // sentence random effects
    vector[I] u_i;        // sentence average random effects
    vector[I] SI;         // SI index
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered child RE
    u_si = m_u + s_u*z_u;  // non-centered utterance RE
    
    // intelligibility and average entropy
    for(i in 1:I){
      u_i[ i ] = mean( u_si[ i, ] );
    }
    
    for(n in 1:N){
      SI[ cid[n] ] = a + 
        aHS[ HS[n] ] + 
        bAm*Am[n] + 
        // bAmHS[ HS[n] ]*Am[n] + 
        e_i[ cid[n] ] +
        u_i[ cid[n] ];
      mu[n] = b_i[ bid[n] ] - SI[ cid[n] ];
    }

}
model{

    // fixed effects priors
    a ~ normal( 0 , 0.05 );
    aHS ~ normal( 0 , 0.3 );
    bAm ~ normal( 0 , 0.1 );
    // bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( -0.5 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( -0.5 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( -0.5 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    s_w ~ exponential( 2 );

    // likelihood
    for(n in 1:N){
      Hwsib[n] ~ normal( mu[n] , s_w );
      // Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
      // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
    }
    
}
"

model_nam = "model02.stan" 
writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) ) 

```
```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (children)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // child's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    real a;               // intercept
    vector[cHS] aHS;      // HS effects
    // real bAm;             // Am effects
    vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // child RE mean
    real<lower=0> s_i;    // child RE sd
    vector[I] z_i;        // non-centered child RE
    real m_u;             // child, utterance RE mean
    real<lower=0> s_u;    // child, utterance RE sd
    matrix[I,U] z_u;      // non-centered child, utterance RE
    
    // variability parameters
    real<lower=0> s_w;      // child, utterance, word sd (one overall)
    // vector<lower=0>[I] s_w; // child, utterance, word sd (one per child)
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // child random effects
    matrix[I,U] u_si;     // sentence random effects
    vector[I] u_i;        // sentence average random effects
    vector[I] SI;         // SI index
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered child RE
    u_si = m_u + s_u*z_u;  // non-centered utterance RE
    
    // intelligibility and average entropy
    for(i in 1:I){
      u_i[ i ] = mean( u_si[ i, ] );
    }
    
    for(n in 1:N){
      SI[ cid[n] ] = a + 
        aHS[ HS[n] ] + 
        // bAm*Am[n] + 
        bAmHS[ HS[n] ]*Am[n] + 
        e_i[ cid[n] ] +
        u_i[ cid[n] ];
      mu[n] = b_i[ bid[n] ] - SI[ cid[n] ];
    }

}
model{

    // fixed effects priors
    a ~ normal( 0 , 0.05 );
    aHS ~ normal( 0 , 0.3 );
    // bAm ~ normal( 0 , 0.1 );
    bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( -0.5 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( -0.5 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( -0.5 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    s_w ~ exponential( 2 );

    // likelihood
    for(n in 1:N){
      Hwsib[n] ~ normal( mu[n] , s_w );
      // Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
      // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
    }
    
}
"

model_nam = "model03.stan" 
writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) ) 

```
```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (children)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // child's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    real a;               // intercept
    // vector[cHS] aHS;      // HS effects
    // real bAm;             // Am effects
    // vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // child RE mean
    real<lower=0> s_i;    // child RE sd
    vector[I] z_i;        // non-centered child RE
    real m_u;             // child, utterance RE mean
    real<lower=0> s_u;    // child, utterance RE sd
    matrix[I,U] z_u;      // non-centered child, utterance RE
    
    // variability parameters
    // real<lower=0> s_w;      // child, utterance, word sd (one overall)
    vector<lower=0>[I] s_w; // child, utterance, word sd (one per child)
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // child random effects
    matrix[I,U] u_si;     // sentence random effects
    vector[I] u_i;        // sentence average random effects
    vector[I] SI;         // SI index
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered child RE
    u_si = m_u + s_u*z_u;  // non-centered utterance RE
    
    // intelligibility and average entropy
    for(i in 1:I){
      u_i[ i ] = mean( u_si[ i, ] );
    }
    
    for(n in 1:N){
      SI[ cid[n] ] = a + 
        // aHS[ HS[n] ] + 
        // bAm*Am[n] + 
        // bAmHS[ HS[n] ]*Am[n] + 
        e_i[ cid[n] ] +
        u_i[ cid[n] ];
      mu[n] = b_i[ bid[n] ] - SI[ cid[n] ];
    }

}
model{

    // fixed effects priors
    a ~ normal( 0 , 0.05 );
    // aHS ~ normal( 0 , 0.3 );
    // bAm ~ normal( 0 , 0.1 );
    // bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( -0.5 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( -0.5 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( -0.5 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    s_w ~ exponential( 2 );

    // likelihood
    for(n in 1:N){
      // Hwsib[n] ~ normal( mu[n] , s_w );
      Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
      log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
    }
    
}
"

model_nam = "model04.stan" 
writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) ) 

```
```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (children)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // child's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    real a;               // intercept
    vector[cHS] aHS;      // HS effects
    real bAm;             // Am effects
    // vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // child RE mean
    real<lower=0> s_i;    // child RE sd
    vector[I] z_i;        // non-centered child RE
    real m_u;             // child, utterance RE mean
    real<lower=0> s_u;    // child, utterance RE sd
    matrix[I,U] z_u;      // non-centered child, utterance RE
    
    // variability parameters
    // real<lower=0> s_w;      // child, utterance, word sd (one overall)
    vector<lower=0>[I] s_w; // child, utterance, word sd (one per child)
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // child random effects
    matrix[I,U] u_si;     // sentence random effects
    vector[I] u_i;        // sentence average random effects
    vector[I] SI;         // SI index
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered child RE
    u_si = m_u + s_u*z_u;  // non-centered utterance RE
    
    // intelligibility and average entropy
    for(i in 1:I){
      u_i[ i ] = mean( u_si[ i, ] );
    }
    
    for(n in 1:N){
      SI[ cid[n] ] = a + 
        aHS[ HS[n] ] + 
        bAm*Am[n] + 
        // bAmHS[ HS[n] ]*Am[n] + 
        e_i[ cid[n] ] +
        u_i[ cid[n] ];
      mu[n] = b_i[ bid[n] ] - SI[ cid[n] ];
    }

}
model{

    // fixed effects priors
    a ~ normal( 0 , 0.05 );
    aHS ~ normal( 0 , 0.3 );
    bAm ~ normal( 0 , 0.1 );
    // bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( -0.5 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( -0.5 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( -0.5 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    s_w ~ exponential( 2 );

    // likelihood
    for(n in 1:N){
      // Hwsib[n] ~ normal( mu[n] , s_w );
      Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
      log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
    }
    
}
"

model_nam = "model05.stan"
writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) )

```
```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (children)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // child's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    real a;               // intercept
    vector[cHS] aHS;      // HS effects
    // real bAm;             // Am effects
    vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // child RE mean
    real<lower=0> s_i;    // child RE sd
    vector[I] z_i;        // non-centered child RE
    real m_u;             // child, utterance RE mean
    real<lower=0> s_u;    // child, utterance RE sd
    matrix[I,U] z_u;      // non-centered child, utterance RE
    
    // variability parameters
    // real<lower=0> s_w;      // child, utterance, word sd (one overall)
    vector<lower=0>[I] s_w; // child, utterance, word sd (one per child)
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // child random effects
    matrix[I,U] u_si;     // sentence random effects
    vector[I] u_i;        // sentence average random effects
    vector[I] SI;         // SI index
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered child RE
    u_si = m_u + s_u*z_u;  // non-centered utterance RE
    
    // intelligibility and average entropy
    for(i in 1:I){
      u_i[ i ] = mean( u_si[ i, ] );
    }
    
    for(n in 1:N){
      SI[ cid[n] ] = a + 
        aHS[ HS[n] ] + 
        // bAm*Am[n] + 
        bAmHS[ HS[n] ]*Am[n] + 
        e_i[ cid[n] ] +
        u_i[ cid[n] ];
      mu[n] = b_i[ bid[n] ] - SI[ cid[n] ];
    }

}
model{

    // fixed effects priors
    a ~ normal( 0 , 0.05 );
    aHS ~ normal( 0 , 0.3 );
    // bAm ~ normal( 0 , 0.1 );
    bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( -0.5 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( -0.5 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( -0.5 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    s_w ~ exponential( 2 );

    // likelihood
    for(n in 1:N){
      // Hwsib[n] ~ normal( mu[n] , s_w );
      Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w );
      log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] );
    }
    
}
"

model_nam = "model06.stan"
writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) ) 

```
```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (children)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // child's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    real a;               // intercept
    // vector[cHS] aHS;      // HS effects
    // real bAm;             // Am effects
    // vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // child RE mean
    real<lower=0> s_i;    // child RE sd
    vector[I] z_i;        // non-centered child RE
    real m_u;             // child, utterance RE mean
    real<lower=0> s_u;    // child, utterance RE sd
    matrix[I,U] z_u;      // non-centered child, utterance RE
    
    // variability parameters
    real<lower=0> Mw;     // 'sample size' parameter
    // vector<lower=0>[I] Mw; // 'sample size' parameter
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // child random effects
    matrix[I,U] u_si;     // sentence random effects
    vector[I] u_i;        // sentence average random effects
    vector[I] SI;         // SI index
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered child RE
    u_si = m_u + s_u*z_u;  // non-centered utterance RE
    
    // intelligibility and average entropy
    for(i in 1:I){
      u_i[ i ] = mean( u_si[ i, ] );
    }
    
    for(n in 1:N){
      SI[ cid[n] ] = a + 
        // aHS[ HS[n] ] + 
        // bAm*Am[n] + 
        // bAmHS[ HS[n] ]*Am[n] + 
        e_i[ cid[n] ] +
        u_i[ cid[n] ];
      mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
    }

}
model{

    // fixed effects priors
    a ~ normal( 0 , 0.05 );
    // aHS ~ normal( 0 , 0.3 );
    // bAm ~ normal( 0 , 0.1 );
    // bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( 0 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( 0 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( 0 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    Mw ~ exponential( 0.5 );

    // likelihood
    for(n in 1:N){
      Hwsib[n] ~ beta_proportion( mu[n] , Mw );
      // Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
      // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
    }
    
}
"

model_nam = "model07.stan"
writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) ) 

```
```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (children)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // child's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    real a;               // intercept
    vector[cHS] aHS;      // HS effects
    real bAm;             // Am effects
    // vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // child RE mean
    real<lower=0> s_i;    // child RE sd
    vector[I] z_i;        // non-centered child RE
    real m_u;             // child, utterance RE mean
    real<lower=0> s_u;    // child, utterance RE sd
    matrix[I,U] z_u;      // non-centered child, utterance RE
    
    // variability parameters
    real<lower=0> Mw;     // 'sample size' parameter
    // vector<lower=0>[I] Mw; // 'sample size' parameter
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // child random effects
    matrix[I,U] u_si;     // sentence random effects
    vector[I] u_i;        // sentence average random effects
    vector[I] SI;         // SI index
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered child RE
    u_si = m_u + s_u*z_u;  // non-centered utterance RE
    
    // intelligibility and average entropy
    for(i in 1:I){
      u_i[ i ] = mean( u_si[ i, ] );
    }
    
    for(n in 1:N){
      SI[ cid[n] ] = a + 
        aHS[ HS[n] ] + 
        bAm*Am[n] + 
        // bAmHS[ HS[n] ]*Am[n] + 
        e_i[ cid[n] ] +
        u_i[ cid[n] ];
      mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
    }

}
model{

    // fixed effects priors
    a ~ normal( 0 , 0.05 );
    aHS ~ normal( 0 , 0.3 );
    bAm ~ normal( 0 , 0.1 );
    // bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( 0 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( 0 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( 0 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    Mw ~ exponential( 0.5 );

    // likelihood
    for(n in 1:N){
      Hwsib[n] ~ beta_proportion( mu[n] , Mw );
      // Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
      // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
    }
    
}
"

model_nam = "model08.stan" 
writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) )

```
```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (children)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // child's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    real a;               // intercept
    vector[cHS] aHS;      // HS effects
    // real bAm;             // Am effects
    vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // child RE mean
    real<lower=0> s_i;    // child RE sd
    vector[I] z_i;        // non-centered child RE
    real m_u;             // child, utterance RE mean
    real<lower=0> s_u;    // child, utterance RE sd
    matrix[I,U] z_u;      // non-centered child, utterance RE
    
    // variability parameters
    real<lower=0> Mw;     // 'sample size' parameter
    // vector<lower=0>[I] Mw; // 'sample size' parameter
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // child random effects
    matrix[I,U] u_si;     // sentence random effects
    vector[I] u_i;        // sentence average random effects
    vector[I] SI;         // SI index
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered child RE
    u_si = m_u + s_u*z_u;  // non-centered utterance RE
    
    // intelligibility and average entropy
    for(i in 1:I){
      u_i[ i ] = mean( u_si[ i, ] );
    }
    
    for(n in 1:N){
      SI[ cid[n] ] = a + 
        aHS[ HS[n] ] + 
        // bAm*Am[n] + 
        bAmHS[ HS[n] ]*Am[n] + 
        e_i[ cid[n] ] +
        u_i[ cid[n] ];
      mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
    }

}
model{

    // fixed effects priors
    a ~ normal( 0 , 0.05 );
    aHS ~ normal( 0 , 0.3 );
    // bAm ~ normal( 0 , 0.1 );
    bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( 0 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( 0 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( 0 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    Mw ~ exponential( 0.5 );

    // likelihood
    for(n in 1:N){
      Hwsib[n] ~ beta_proportion( mu[n] , Mw );
      // Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
      // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
    }
    
}
"

model_nam = "model09.stan"
writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) ) 

```
```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (children)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // child's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    real a;               // intercept
    // vector[cHS] aHS;      // HS effects
    // real bAm;             // Am effects
    // vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // child RE mean
    real<lower=0> s_i;    // child RE sd
    vector[I] z_i;        // non-centered child RE
    real m_u;             // child, utterance RE mean
    real<lower=0> s_u;    // child, utterance RE sd
    matrix[I,U] z_u;      // non-centered child, utterance RE
    
    // variability parameters
    // real<lower=0> Mw;    // 'sample size' parameter
    vector<lower=0>[I] Mw; // 'sample size' parameter
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // child random effects
    matrix[I,U] u_si;     // sentence random effects
    vector[I] u_i;        // sentence average random effects
    vector[I] SI;         // SI index
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered child RE
    u_si = m_u + s_u*z_u;  // non-centered utterance RE
    
    // intelligibility and average entropy
    for(i in 1:I){
      u_i[ i ] = mean( u_si[ i, ] );
    }
    
    for(n in 1:N){
      SI[ cid[n] ] = a + 
        // aHS[ HS[n] ] + 
        // bAm*Am[n] + 
        // bAmHS[ HS[n] ]*Am[n] + 
        e_i[ cid[n] ] +
        u_i[ cid[n] ];
      mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
    }

}
model{

    // fixed effects priors
    a ~ normal( 0 , 0.05 );
    // aHS ~ normal( 0 , 0.3 );
    // bAm ~ normal( 0 , 0.1 );
    // bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( 0 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( 0 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( 0 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    Mw ~ exponential( 0.5 );

    // likelihood
    for(n in 1:N){
      // Hwsib[n] ~ beta_proportion( mu[n] , Mw );
      Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
      log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
    }
    
}
"

model_nam = "model10.stan"
writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) )

```
```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (children)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // child's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    real a;               // intercept
    vector[cHS] aHS;      // HS effects
    real bAm;             // Am effects
    // vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // child RE mean
    real<lower=0> s_i;    // child RE sd
    vector[I] z_i;        // non-centered child RE
    real m_u;             // child, utterance RE mean
    real<lower=0> s_u;    // child, utterance RE sd
    matrix[I,U] z_u;      // non-centered child, utterance RE
    
    // variability parameters
    // real<lower=0> Mw;     // 'sample size' parameter
    vector<lower=0>[I] Mw; // 'sample size' parameter
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // child random effects
    matrix[I,U] u_si;     // sentence random effects
    vector[I] u_i;        // sentence average random effects
    vector[I] SI;         // SI index
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered child RE
    u_si = m_u + s_u*z_u;  // non-centered utterance RE
    
    // intelligibility and average entropy
    for(i in 1:I){
      u_i[ i ] = mean( u_si[ i, ] );
    }
    
    for(n in 1:N){
      SI[ cid[n] ] = a + 
        aHS[ HS[n] ] + 
        bAm*Am[n] + 
        // bAmHS[ HS[n] ]*Am[n] + 
        e_i[ cid[n] ] +
        u_i[ cid[n] ];
      mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
    }

}
model{

    // fixed effects priors
    a ~ normal( 0 , 0.05 );
    aHS ~ normal( 0 , 0.3 );
    bAm ~ normal( 0 , 0.1 );
    // bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( 0 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( 0 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( 0 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    Mw ~ exponential( 0.5 );

    // likelihood
    for(n in 1:N){
      // Hwsib[n] ~ beta_proportion( mu[n] , Mw );
      Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
      log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
    }
    
}
"

model_nam = "model11.stan"
writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) ) 

```
```{r}

mcmc_code = "
data{

    // dimensions
    int N;                // number of experimental runs
    int B;                // max. number of blocks
    int I;                // max. number of experimental units (children)
    int U;                // max. number of sentences
    int W;                // max. number of words

    // category numbers
    int cHS;              // max. number of categories in HS
    
    // data
    array[N] int<lower=1, upper=B> bid;   // block id
    array[N] int<lower=1, upper=I> cid;   // child's id
    array[N] int<lower=1, upper=U> uid;   // sentence's id
    
    array[N] int<lower=1, upper=cHS> HS;  // hearing status
    array[N] real Am;     // chron. age - min( chron. age )
    array[N] real Hwsib;  // replicated entropies
    
}
parameters{
    
    // fixed effects parameters
    real a;               // intercept
    vector[cHS] aHS;      // HS effects
    // real bAm;             // Am effects
    vector[cHS] bAmHS;    // Am effects (per HS)

    // random effects parameters
    real m_b;             // block RE mean
    real<lower=0> s_b;    // block RE sd
    vector[B] z_b;        // non-centered block RE
    real m_i;             // child RE mean
    real<lower=0> s_i;    // child RE sd
    vector[I] z_i;        // non-centered child RE
    real m_u;             // child, utterance RE mean
    real<lower=0> s_u;    // child, utterance RE sd
    matrix[I,U] z_u;      // non-centered child, utterance RE
    
    // variability parameters
    // real<lower=0> Mw;     // 'sample size' parameter
    vector<lower=0>[I] Mw; // 'sample size' parameter
    
}
transformed parameters{
    
    // to track
    vector[B] b_i;        // block random effects
    vector[I] e_i;        // child random effects
    matrix[I,U] u_si;     // sentence random effects
    vector[I] u_i;        // sentence average random effects
    vector[I] SI;         // SI index
    vector[N] mu;         // NO TRACK
    
    // random effects
    b_i = m_b + s_b*z_b;  // non-centered block RE
    e_i = m_i + s_i*z_i;  // non-centered child RE
    u_si = m_u + s_u*z_u;  // non-centered utterance RE
    
    // intelligibility and average entropy
    for(i in 1:I){
      u_i[ i ] = mean( u_si[ i, ] );
    }
    
    for(n in 1:N){
      SI[ cid[n] ] = a + 
        aHS[ HS[n] ] + 
        // bAm*Am[n] + 
        bAmHS[ HS[n] ]*Am[n] + 
        e_i[ cid[n] ] +
        u_i[ cid[n] ];
      mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] );
    }

}
model{

    // fixed effects priors
    a ~ normal( 0 , 0.05 );
    aHS ~ normal( 0 , 0.3 );
    // bAm ~ normal( 0 , 0.1 );
    bAmHS ~ normal( 0 , 0.1 );
    
    // random effects priors
    m_b ~ normal( 0 , 0.05 );
    s_b ~ exponential( 2 );
    z_b ~ std_normal();
    m_i ~ normal( 0 , 0.05 );
    s_i ~ exponential( 2 );
    z_i ~ std_normal();
    m_u ~ normal( 0 , 0.05 );
    s_u ~ exponential( 2 );
    to_vector( z_u ) ~ std_normal();
    
    // variability priors
    Mw ~ exponential( 0.5 );

    // likelihood
    for(n in 1:N){
      // Hwsib[n] ~ beta_proportion( mu[n] , Mw );
      Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] );
    }
    
}
generated quantities{

    // track
    vector[N] log_lik;
    
    // log-likelihood
    for(n in 1:N){
      // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw );
      log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] );
    }
    
}
"

model_nam = "model12.stan"
writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) ) 

```

Furthermore, the following code can be used to fit all the STAN models, changing the appropriate model name.

```{r}

model_nam = "modelXX.stan"
model_in = file.path(getwd(), 'real_models')
model_out = file.path(getwd(), 'real_chain')
mod = cmdstan_model( file.path(model_in, model_nam) )

print(model_nam)
mod$sample( data=dlist,
            output_dir=model_out,
            output_basename = str_replace(model_nam, '.stan', ''),
            chains=4, parallel_chains=4,
            max_treedepth=20, adapt_delta=0.95)
            
```

Model selection method and criteria

The current research used the Information-Theoretic Approach (Anderson, 2008; Chamberlain, 1965) for model selection and inference. The Information-Theoretic Approach is composed of three steps: (1) the expression of the research hypothesis into statistical models, (2) the selection of the most plausible models, and (3) the production of inferences based on one or multiple selected models. The first step of the approach is developed in Section 7.1, the second in Section 7.6, and the third in Section 8.

Information-Theoretic Approach

Approach to model selection and inference composed of three steps:

  1. Express research hypothesis into models,
  2. Select most plausible models,
  3. Produce inferences based on selected models

Related to second step, the selected criteria for choosing among competing models were the widely applicable information criterion (WAIC) (Watanabe, 2013) and the pareto-smoothed importance sampling cross-validation criterion (PSIS), developed by Vehtari and colleagues (2021). The use of WAIC and PSIS is justified based on two fundamental characteristics of the criteria. First, WAIC and PSIS incorporates all the information available in the posterior distribution of the parameters. This effectively integrates all the uncertainty inherent in the parameter estimates. Second, the criteria provides the most accurate approximation for the cross-validated deviance (McElreath, 2020), which serves as the closest estimate to the Kullback-Leibler divergence (Kullback and Leibler, 1951). The Kullback-Leibler divergence measures the degree to which a model accurately represents the actual distribution of the data.

Reasons to use WAIC and PSIS

The criteria:

  1. Incorporates all the uncertainty of the posterior distribution, and
  2. Evaluates the degree to which each model deviates from achieving perfect predictive accuracy for the data.

Consequently, by comparing the criteria values across different models this study evaluates the degree to which each model deviates from achieving perfect predictive accuracy for the data (McElreath, 2020). Specifically, models with lower WAIC or PSIS exhibit less deviation from perfect predictive accuracy, compared to alternative models. Additionally, this evaluation provides an indication of the level of uncertainty associated with the findings.

Results

Bayesian inference results

In this section, the results of the Bayesian inference procedures are presented. Model 6 and 12 are selected as representative, as they have the largest number of parameters among all the models (refer to Section 7.6). The selected models are then utilized to illustrate the quality of the Bayesian estimates, in terms of stationarity, convergence, mixing, and the information available in the posterior distribution of the parameters. It is crucial to emphasize that the authors conducted a meticulous inspection of all the fitted models. All models achieved similar results to those selected for illustration.

The following code loads the model information. The function file_id() is a user-defined function that identifies, within a particular directory, the files generated by STAN as a result of the fitting process.

```{r}

# load reference models
model_nam = "modelXX"
model_out = file.path( save_dir, 'real_chain')
model_fit = file_id( model_out, model_nam ) 
modelXX = rstan::read_stan_csv( file.path( model_out, model_fit ) )

```

Furthermore, the following code serves to display a concise information of the parameter estimates for the selected models.

Code
parameters = c( 'a','aHS','bAm','bAmHS',
                'm_b','s_b',
                'm_i','s_i',
                'm_u','s_u',
                's_w','Mw')

model06_parameters = parameter_recovery(
  stan_object = model06,
  est_par = parameters,
  p = 0.95 )
model06_parameters = model06_parameters[-c(6:7),]

model12_parameters = parameter_recovery(
  stan_object = model12,
  est_par = parameters,
  p = 0.95 )
model12_parameters = model12_parameters[-c(6:7),]
1
parameters of interest
2
user-defined function: displays concise parameter estimate information for selected model
3
user-defined function: displays concise parameter estimate information for selected model

Stationarity, convergence and mixing

Figure 33 through Figure 40 show the trace, trace-rank, and ACF plots for selected parameters of the chosen models. First, the left panels of the figures display the trace plots. These plots illustrate that for each parameter, the chains’ post-warm-up iterations visually converge to a mean value with a constant variance. Second, the middle panel of the figures shows the trace-rank plots. These panels reveal that each parameter chains explored their respective parameter space in a seemingly random manner. Third, the right panel of the figures shows the ACF plots. These panels demonstrate that each parameter chains have relatively low autocorrelations.

Furthermore, Figure 41 shows the \hat{R} statistic for each parameter of the selected model. The figure reveals that the statistics supports the assessment of convergence in the chains’ post-warm-up iterations for each parameter, as no parameter \hat{R} exceeds the threshold of 1.05.

Code
tri_plot( stan_object=model06,
          pars=c('a','aHS[1]','aHS[2]') )
1
used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model

Figure 33: Model 06, trace, trace rank and ACF plots of selected parameters
Code
tri_plot( stan_object=model12,
          pars=c('bAmHS[1]','bAmHS[2]') )
1
used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model

Figure 34: Model 12, trace, trace rank and ACF plots of selected parameters
Code
tri_plot( stan_object=model06,
          pars=c('m_b','s_b') )
1
used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model

Figure 35: Model 06, trace, trace rank and ACF plots of selected parameters
Code
tri_plot( stan_object=model12,
          pars=c('m_i','s_i') )
1
used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model

Figure 36: Model 1, trace, trace rank and ACF plots of selected parameters
Code
tri_plot( stan_object=model06,
          pars=c('m_u','s_u') )
1
used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model

Figure 37: Model 06, trace, trace rank and ACF plots of selected parameters
Code
tri_plot( stan_object=model12,
          pars=paste0('Mw[', 1:5,']') )
1
used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model

Figure 38: Model 12, trace, trace rank and ACF plots of selected parameters
Code
tri_plot( stan_object=model06,
          pars=paste0('s_w[', 1:5,']') )
1
used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model

Figure 39: Model 06, trace, trace rank and ACF plots of selected parameters
Code
tri_plot( stan_object=model12,
          pars=paste0('SI[', 1:5, ']') )
1
used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model

Figure 40: Model 12, trace, trace rank and ACF plots of selected parameters
Code
par( mfrow=c(1,2) )

plot( 1:nrow(model06_parameters), model06_parameters$Rhat4,
      ylim=c(0.95, 1.1), pch=19, col=rgb(0,0,0,alpha=0.3),
      xaxt='n',xlab='', ylab='Rhat', 
      main='Normal GLLAMM: model 06')
axis( side=1, at=1:nrow(model06_parameters),
      labels=rownames(model06_parameters),
      cex.axis=0.8, las=2 )
abline( h=1.05, lty=2 )

plot( 1:nrow(model12_parameters), model12_parameters$Rhat4,
      ylim=c(0.95, 1.1), pch=19, col=rgb(0,0,0,alpha=0.3),
      xaxt='n',xlab='', ylab='Rhat', 
      main='Beta-proportion GLLAMM: model 12')
axis( side=1, at=1:nrow(model12_parameters),
      labels=rownames(model12_parameters),
      cex.axis=0.8, las=2 )
abline( h=1.05, lty=2 )

par( mfrow=c(1,1) )
1
model 06: Rhat values plot
2
model 06: parameters names in x-axis
3
model 06: convergence threshold
4
model 12: Rhat values plot
5
model 12: parameters names in x-axis
6
model 12: convergence threshold

Figure 41: Selected models, Rhat values

Posterior information

Figure 42 through Figure 46 show the density plots for selected parameters of the chosen models. The figures reveal that the parameters’ posterior distributions have enough information. This implies the posterior distributions have been generated with sufficient uncorrelated sampling points. Furthermore, Figure 47 shows the plots for effective sample size statistics (n_{\text{eff}}) for each parameter. The left and right panels of the figure demonstrate that most of the parameters have n_{\text{eff}} > 2000, with a few others having n_{\text{eff}} < 2000 post-warm-up iterations set by the Bayesian procedure (refer to Section 7.3). In aggregate, this provides supporting evidence that the parameters’ posterior distributions have sufficient information about the parameters, and they make substantive sense compared to the models’ prior beliefs.

On the other hand, the figures also reveal that all the posteriors are unimodal distributions with values centered around a mean. This provides additional evidence of stationarity and convergence of the distributions, as described in Section 8.1.1.

Code
pars = c('a','aHS[1]','aHS[2]','bAmHS[1]','bAmHS[2]')

par( mfrow=c(2,3) )
dens_plot( stan_object=model06, p=0.95,
             pars=pars[1], x_lab=pars[1] )

for( i in 2:length(pars) ){
  dens_plot( stan_object=model06, p=0.95,
             pars=pars[i], x_lab=pars[i] )   
}
par( mfrow=c(1,1) )
1
parameters of interest
2
used-defined function: generation of density plot with HPDI confidence intervals for selected parameters

Figure 42: Model 06, density plots of selected parameters
Code
pars = c('m_b','m_i','m_u','s_b','s_i','s_u')

par( mfrow=c(2,3) )
dens_plot( stan_object=model12, p=0.95,
             pars=pars[1], x_lab=pars[1] )

for( i in 2:length(pars) ){
  dens_plot( stan_object=model12, p=0.95,
             pars=pars[i], x_lab=pars[i] )   
}
par( mfrow=c(1,1) )
1
parameters of interest
2
used-defined function: generation of density plot with HPDI confidence intervals for selected parameters

Figure 43: Model 12, density plots of selected parameters
Code
pars = paste0('s_w[',1:6,']')

par( mfrow=c(2,3) )
dens_plot( stan_object=model06, p=0.95,
             pars=pars[1], x_lab=pars[1] )

for( i in 2:length(pars) ){
  dens_plot( stan_object=model06, p=0.95,
             pars=pars[i], x_lab=pars[i] )   
}
par( mfrow=c(1,1) )
1
parameters of interest
2
used-defined function: generation of density plot with HPDI confidence intervals for selected parameters

Figure 44: Model 06, density plots of selected parameters
Code
pars = paste0('Mw[',1:6,']')

par( mfrow=c(2,3) )
dens_plot( stan_object=model12, p=0.95,
             pars=pars[1], x_lab=pars[1] )

for( i in 2:length(pars) ){
  dens_plot( stan_object=model12, p=0.95,
             pars=pars[i], x_lab=pars[i] )   
}
par( mfrow=c(1,1) )
1
parameters of interest
2
used-defined function: generation of density plot with HPDI confidence intervals for selected parameters

Figure 45: Model 12, density plots of selected parameters
Code
pars = paste0('SI[',1:6,']')

par( mfrow=c(2,3) )
dens_plot( stan_object=model06, p=0.95,
             pars=pars[1], x_lab=pars[1] )

for( i in 2:length(pars) ){
  dens_plot( stan_object=model06, p=0.95,
             pars=pars[i], x_lab=pars[i] )   
}
par( mfrow=c(1,1) )
1
parameters of interest
2
used-defined function: generation of density plot with HPDI confidence intervals for selected parameters

Figure 46: Model 06, density plots of selected parameters
Code
par( mfrow=c(1,2) )

plot( 1:nrow(model06_parameters), model06_parameters$n_eff,
      ylim=c(0,10000), pch=19, col=rgb(0,0,0,alpha=0.3),
      xaxt='n',xlab='', ylab='neff', 
      main='Normal GLLAMM: model 06')
axis( side=1, at=1:nrow(model06_parameters),
      labels=rownames(model06_parameters),
      cex.axis=0.8, las=2 )
abline( h=c(0, 1000, 2000), lty=2 )

plot( 1:nrow(model12_parameters), model12_parameters$n_eff,
      ylim=c(0,10000), pch=19, col=rgb(0,0,0,alpha=0.3),
      xaxt='n',xlab='', ylab='neff', 
      main='Beta-proportion GLLAMM: model 12')
axis( side=1, at=1:nrow(model12_parameters),
      labels=rownames(model12_parameters),
      cex.axis=0.8, las=2 )
abline( h=c(0, 1000, 2000), lty=2 )

par( mfrow=c(1,1) )
1
model 06: neff values plot
2
model 06: parameters names in x-axis
3
model 06: convergence threshold
4
model 12: neff values plot
5
model 12: parameters names in x-axis
6
model 12: convergence threshold

Figure 47: Selected models, Neff values

Research question 1

talk about:

  1. selection of models
  2. a comparison of predictions
  3. the variability
  • WAIC and PSIS indicate that beta model 7 is way better. Why, because the predictions now fall within the limits of the data (the other model shows a clear sign of underfitting )
  • Most variability is at the word-level. If this is not controlled you would be overestimating the precision of the tests. Because each observation doesn’t count as one effective piece of info, but less than that because they are correlated.
  • the unexplained variability of children and sentences follow in terms of the amount. This just describe how much variability is expected because of each thing.
  • the variability of block effects indicate that there might be some presence of lack of inter-listener reliability. If this was not controller this lack of reliability might creep in the hypothesis test of the parameters
Code
require(rethinking)

set.seed(12345)

RQ1_WAIC = compare( func=WAIC,
                    model01, model04,
                    model07, model10 )

round( RQ1_WAIC, 3 )
1
package requirement
2
seed for replication
3
comparison of selected models with WAIC
4
reporting with three decimal points
             WAIC      SE    dWAIC     dSE   pWAIC weight
model10 -9630.594 276.575    0.000      NA  55.261      1
model07 -9585.461 274.516   45.133  17.834  31.925      0
model04 -1310.091 123.839 8320.503 255.488 111.054      0
model01  -899.016  94.370 8731.578 261.897  37.392      0
Code
require(rethinking)

set.seed(12345)

RQ1_PSIS = compare( func=PSIS,
                    model01, model04,
                    model07, model10 )

round( RQ1_PSIS, 3 )
1
package requirement
2
seed for replication
3
comparison of selected models with PSIS
4
reporting with three decimal points
             PSIS      SE    dPSIS     dSE   pPSIS weight
model10 -9629.326 276.672    0.000      NA  55.895      1
model07 -9585.333 274.576   43.993  17.644  31.989      0
model04 -1306.812 124.660 8322.514 255.704 112.693      0
model01  -898.876  94.397 8730.450 261.874  37.462      0
Code
data_pred( true_data=data_H,
           stan_object=model01,
           normal=T, p=0.95, 
           y_lim=c(-0.5, 1.1) )
1
user defined function: plot entropy scores per child with transformation of potential intelligibility into entropy scale

Figure 48: Model 01, entropy score predictions
Code
data_pred( true_data=data_H,
           stan_object=model04,
           normal=T, p=0.95, 
           y_lim=c(-0.5, 1.1) )
1
user defined function: plot entropy scores per child with transformation of potential intelligibility into entropy scale

Figure 49: Model 04, entropy score predictions
Code
data_pred( true_data=data_H,
           stan_object=model07,
           normal=F, p=0.95, 
           y_lim=c(-0.5, 1.1) )
1
user defined function: plot entropy scores per child with transformation of potential intelligibility into entropy scale

Figure 50: Model 07, entropy score predictions
Code
data_pred( true_data=data_H,
           stan_object=model10,
           normal=F, p=0.95, 
           y_lim=c(-0.5, 1.1) )
1
user defined function: plot entropy scores per child with transformation of potential intelligibility into entropy scale

Figure 51: Model 10, entropy score predictions
Code
pars = c('s_b','s_i','s_u','Mw')

par( mfrow=c(2,2) )

dens_plot( stan_object=model07, p=0.95,
             pars=pars[1], x_lab=pars[1] )

for( i in 2:length(pars) ){
  dens_plot( stan_object=model07, p=0.95,
             pars=pars[i], x_lab=pars[i] )   
}

par( mfrow=c(1,1) )
1
parameters of interest
2
used-defined function: generation of density plot with HPDI confidence intervals for selected parameters

Figure 52: Model 07, variability parameters density plots
Code
dens_var( stan_object=model07, p=0.95,
          bp=0.5, bM=3,
          pars=c('s_b','s_i','s_u','Mw')) 
1
user defined function: plots the transformation of the variability parameters into the entropy scale.

Figure 53: Model 07, variability parameters density plots at the entropy scale

Research question 2

talk about:

  1. the capacity to create a potential intelligibility score
  2. notice the ordering is not the same as the previous plot for model 07 (makes sense)
  3. epistemiological concerns
  • We can construct potential intelligibility.
  • This allows us to offer a ranking of individuals, and even make comparison among them assessing also the certainty of the comparison
Code
parameters = 'SI'

model10_SI = parameter_recovery(
  stan_object = model07,
  est_par = parameters,
  p = 0.95 )

idx = order( model10_SI$mean, decreasing=F)
model10_SI = model10_SI[idx, ]
model10_SI
1
parameter of interest
2
user-defined function: displays concise parameter estimate information for selected model
3
sorting an presenting ordered potential intelligibility
         mean    sd CI_lower CI_upper HPDI_lower HPDI_upper    n_eff Rhat4
SI[6]  -2.200 0.214   -2.622   -1.792     -2.613     -1.785 2567.539 1.001
SI[9]  -0.898 0.202   -1.288   -0.507     -1.268     -0.489 2821.647 1.000
SI[17] -0.830 0.200   -1.215   -0.434     -1.201     -0.426 2330.581 1.001
SI[1]  -0.785 0.199   -1.160   -0.388     -1.165     -0.399 2593.797 1.001
SI[11] -0.520 0.200   -0.895   -0.129     -0.883     -0.121 2427.512 1.002
SI[29] -0.433 0.201   -0.831   -0.042     -0.815     -0.031 2760.899 1.002
SI[12] -0.260 0.199   -0.651    0.124     -0.639      0.132 2406.853 1.001
SI[19] -0.165 0.197   -0.552    0.228     -0.533      0.243 2403.740 1.000
SI[2]  -0.073 0.198   -0.461    0.318     -0.453      0.321 2663.327 1.001
SI[8]  -0.002 0.196   -0.384    0.388     -0.398      0.371 2449.699 1.002
SI[30]  0.016 0.203   -0.377    0.424     -0.396      0.402 2624.290 1.001
SI[32]  0.044 0.195   -0.334    0.422     -0.332      0.423 2539.299 1.000
SI[4]   0.105 0.198   -0.284    0.493     -0.283      0.493 2514.952 1.001
SI[13]  0.125 0.200   -0.264    0.523     -0.238      0.541 2277.597 1.002
SI[26]  0.175 0.195   -0.208    0.559     -0.223      0.537 2430.423 1.001
SI[10]  0.193 0.197   -0.183    0.580     -0.184      0.578 2609.567 1.001
SI[28]  0.239 0.196   -0.146    0.636     -0.128      0.650 2478.483 1.002
SI[18]  0.260 0.199   -0.132    0.651     -0.147      0.634 2568.642 1.001
SI[7]   0.298 0.200   -0.108    0.675     -0.090      0.692 2595.222 1.001
SI[24]  0.322 0.192   -0.043    0.700     -0.030      0.708 2442.892 1.003
SI[14]  0.353 0.196   -0.031    0.743     -0.024      0.746 2548.653 1.000
SI[5]   0.440 0.193    0.055    0.823      0.032      0.790 2505.138 1.001
SI[15]  0.515 0.196    0.127    0.900      0.153      0.922 2174.687 1.002
SI[22]  0.516 0.191    0.144    0.884      0.142      0.881 2371.691 1.001
SI[27]  0.549 0.197    0.172    0.947      0.160      0.931 2510.259 1.000
SI[3]   0.561 0.197    0.179    0.945      0.188      0.953 2442.687 1.001
SI[31]  0.694 0.194    0.327    1.082      0.308      1.059 2315.612 1.001
SI[16]  0.709 0.199    0.315    1.094      0.334      1.109 2545.100 1.001
SI[25]  0.713 0.194    0.342    1.095      0.338      1.083 2338.987 1.000
SI[21]  0.715 0.194    0.340    1.104      0.345      1.106 2791.421 1.001
SI[23]  0.771 0.192    0.398    1.145      0.409      1.155 2354.139 1.001
SI[20]  0.861 0.194    0.483    1.251      0.464      1.225 2733.497 1.000
Code
col_func = colorRampPalette(c("red", "blue"))
col_plot = col_func( nrow(model10_SI) )

SI_plot( precis_object=model10_SI,
         col_string=col_plot )
1
color generation
2
user defined function: plot ordered potential intelligibility

Figure 54: Model 10, potential intelligibility per child

Research question 3

talk about:

  1. model selection
  2. parameter interpretations
  3. parameter contrasts
  4. potential intelligibility estimation
  5. model predictions
  6. outlier identification
  • the evidence is not unequivocal between model 10, 11, and 12.
  • model 10 to estimate SI without other knowledge
  • model 12 indicates that HS start at the same level as HI/CI, but they evolve differently
  • no evaluation if the evolution achieves a valley with more chronological age
  • interesting model 10 is similar to model 07, but with one extra assumption: there is some specific word-level variability for each child.
Code
require(rethinking)

set.seed(12345)

RQ3_WAIC = compare( func=WAIC,
                    model01, model02, model03, 
                    model04, model05, model06,
                    model07, model08, model09, 
                    model10, model11, model12 )

round( RQ3_WAIC, 3 )
1
package requirement
2
seed for replication
3
comparison of selected models with WAIC
4
reporting with three decimal points
             WAIC      SE    dWAIC     dSE   pWAIC weight
model12 -9632.361 276.730    0.000      NA  54.353  0.473
model11 -9631.647 276.754    0.714   1.185  54.763  0.331
model10 -9630.594 276.575    1.767   3.141  55.261  0.196
model08 -9586.070 274.339   46.292  18.246  31.434  0.000
model09 -9585.903 274.378   46.458  18.232  31.577  0.000
model07 -9585.461 274.516   46.900  18.453  31.925  0.000
model04 -1310.091 123.839 8322.271 255.482 111.054  0.000
model06 -1308.471 123.738 8323.891 255.558 111.281  0.000
model05 -1307.905 123.777 8324.456 255.593 111.697  0.000
model02  -900.578  94.027 8731.784 262.234  36.529  0.000
model03  -900.202  94.069 8732.159 262.218  36.785  0.000
model01  -899.016  94.370 8733.345 262.137  37.392  0.000
Code
require(rethinking)

set.seed(12345)

RQ3_PSIS = compare( func=PSIS,
                    model01, model02, model03, 
                    model04, model05, model06,
                    model07, model08, model09, 
                    model10, model11, model12 )

round( RQ3_PSIS, 3 )
1
package requirement
2
seed for replication
3
comparison of selected models with PSIS
4
reporting with three decimal points
             PSIS      SE    dPSIS     dSE   pPSIS weight
model12 -9630.790 276.844    0.000      NA  55.139  0.419
model11 -9630.597 276.845    0.193   0.911  55.288  0.380
model10 -9629.326 276.672    1.465   3.300  55.895  0.201
model08 -9585.922 274.399   44.868  18.061  31.508  0.000
model09 -9585.777 274.437   45.014  18.041  31.640  0.000
model07 -9585.333 274.576   45.457  18.271  31.989  0.000
model04 -1306.812 124.660 8323.978 255.688 112.693  0.000
model06 -1306.070 124.054 8324.721 255.580 112.482  0.000
model05 -1305.127 124.011 8325.663 255.562 113.086  0.000
model02  -900.459  94.051 8730.331 262.209  36.589  0.000
model03  -900.088  94.094 8730.702 262.193  36.842  0.000
model01  -898.876  94.397 8731.914 262.113  37.462  0.000
Code
parameters = c( 'a','aHS','bAm','bAmHS',
                'm_b','s_b','m_i','s_i',
                'm_u','s_u','Mw')

model12_parameters = parameter_recovery(
  stan_object = model12,
  est_par = parameters,
  p = 0.95 )
model12_parameters = model12_parameters[-c(6:7),]

model12_parameters[1:5,]
1
parameters of interest
2
user-defined function: displays concise parameter estimate information for selected model
3
reporting
          mean    sd CI_lower CI_upper HPDI_lower HPDI_upper    n_eff Rhat4
a        0.012 0.049   -0.086    0.107     -0.085      0.108 6267.560 1.000
aHS[1]   0.219 0.253   -0.288    0.700     -0.296      0.689 4177.409 1.000
aHS[2]   0.237 0.241   -0.244    0.701     -0.233      0.709 2709.563 1.000
bAmHS[1] 0.097 0.016    0.066    0.127      0.067      0.128 3298.478 1.000
bAmHS[2] 0.056 0.014    0.029    0.086      0.028      0.084 1833.763 1.004
Code
require(coda)

post = extract.samples( model12 )

cont_post = data.frame( post$aHS[,2] - post$aHS[,1],
                        post$bAmHS[,2] - post$bAmHS[,1] )
colnames(cont_post) = c( 'aHS[2] - aHS[1]',
                         'bAmHS[2] - bAmHS[1]' )

hpdi_res = HPDinterval( as.mcmc(cont_post), prob=0.95 )
attr(hpdi_res, 'dimnames')[[2]] = c('HPDI_lower','HPDI_upper') 

cont_res = precis( cont_post, depth=2, hist=F, prob=0.95 )
names(cont_res)[3:4] = c('CI_lower','CI_upper')

cont_res = cbind(cont_res, hpdi_res)
round( cont_res, 3 ) 
1
package requirement
2
posterior distribution samples for selected parameters
3
contrasts calculation
4
95% Highest Probability Density Interval (HPDI) for contrasts
5
concise estimate information for contrasts
6
reporting with three decimal points
                      mean    sd CI_lower CI_upper HPDI_lower HPDI_upper
aHS[2] - aHS[1]      0.018 0.346   -0.668    0.709     -0.701      0.666
bAmHS[2] - bAmHS[1] -0.041 0.020   -0.081   -0.002     -0.078      0.000
Code
parameters = 'SI'

model12_SI = parameter_recovery(
  stan_object = model12,
  est_par = parameters,
  p = 0.95 )
nam = rownames(model12_SI)

d_extra = unique( data.frame( dlist[ c('cid','HS','Am') ] ) )
model12_SI = cbind(d_extra, model12_SI)
rownames(model12_SI) = nam

idx = with( model12_SI,
            order( HS, mean, decreasing=F ) )
model12_SI = model12_SI[idx, ]
model12_SI[,-c(1,11)]
1
parameter selection
2
user-defined function: displays concise parameter estimate information for selected model
3
extracting and concatenating estimates with hearing status and chronological age data
4
selecting, ordering and reporting data based on hearing status categories and chronological age
       HS Am   mean    sd CI_lower CI_upper HPDI_lower HPDI_upper    n_eff
SI[11]  1 14  1.054 0.176    0.696    1.390      0.713      1.398 3204.386
SI[8]   1 18  1.292 0.199    0.892    1.670      0.884      1.654 3991.232
SI[12]  1 14  1.341 0.182    0.969    1.679      0.989      1.697 4162.150
SI[32]  1 10  1.502 0.200    1.098    1.879      1.084      1.861 4271.165
SI[4]   1 12  1.529 0.197    1.138    1.906      1.165      1.928 4674.449
SI[10]  1 14  1.644 0.194    1.253    2.025      1.245      2.011 4758.323
SI[18]  1 18  1.885 0.214    1.465    2.289      1.492      2.307 3654.922
SI[24]  1 14  2.093 0.217    1.660    2.491      1.666      2.494 3624.179
SI[5]   1 14  2.109 0.219    1.660    2.523      1.677      2.538 2972.132
SI[16]  1 10  2.378 0.261    1.859    2.877      1.861      2.878 5606.824
SI[27]  1 25  2.454 0.224    1.998    2.869      1.993      2.860 4429.732
SI[21]  1 20  2.494 0.241    2.006    2.947      2.002      2.940 3967.784
SI[3]   1 25  2.622 0.226    2.156    3.027      2.154      3.022 3991.461
SI[22]  1 30  2.636 0.221    2.177    3.044      2.213      3.072 4337.945
SI[23]  1 26  2.872 0.256    2.361    3.356      2.356      3.350 5015.862
SI[20]  1 29  3.412 0.260    2.871    3.886      2.875      3.891 4617.563
SI[6]   2 17 -0.490 0.201   -0.894   -0.091     -0.898     -0.100 3699.699
SI[17]  2  8  0.654 0.174    0.293    0.976      0.315      0.987 3247.691
SI[9]   2 15  0.675 0.163    0.332    0.984      0.330      0.978 3872.550
SI[1]   2 17  0.677 0.172    0.326    1.016      0.313      1.001 2948.150
SI[29]  2 12  1.008 0.176    0.649    1.346      0.672      1.359 4708.604
SI[19]  2 18  1.186 0.191    0.796    1.556      0.808      1.560 2387.089
SI[2]   2  0  1.232 0.200    0.833    1.619      0.837      1.621 3864.454
SI[30]  2 16  1.434 0.200    1.040    1.819      1.047      1.824 3572.047
SI[28]  2 25  1.544 0.202    1.145    1.925      1.172      1.939 4180.121
SI[13]  2 17  1.547 0.204    1.126    1.937      1.152      1.949 4473.619
SI[7]   2 36  1.594 0.220    1.152    2.016      1.162      2.023 1903.714
SI[26]  2 17  1.754 0.218    1.317    2.161      1.315      2.161 2775.056
SI[15]  2 17  1.772 0.222    1.320    2.196      1.370      2.235 2887.050
SI[14]  2 17  1.933 0.224    1.475    2.356      1.512      2.389 4783.318
SI[25]  2 25  2.471 0.260    1.950    2.980      1.968      2.990 2120.491
SI[31]  2 36  2.636 0.249    2.113    3.102      2.165      3.134 4342.191
Code
col_func = colorRampPalette(c("red", "blue"))
col_plot = col_func( nrow(model12_SI) )

idx = with( model12_SI,
            order( mean, decreasing=F ) )
model12_SI = model12_SI[idx, ]

SI_plot( precis_object=model12_SI,
         col_string=col_plot )
1
color generation
2
ordering data in terms of potential intelligibility
3
user defined function: plot ordered potential intelligibility

Figure 55: Model 12, ordered potential intelligibility per child
Code
col_plot = rep(rethink_palette[1:2], each=16)

idx = with( model12_SI,
            order( HS, mean, decreasing=F ) )
model12_SI = model12_SI[idx, ]

SI_plot( precis_object=model12_SI,
         col_string=col_plot )
abline( v=16.5, lty=2 )
legend( 'topleft', c('NH','HI/CI'), 
        fill=rethink_palette[1:2], bty='n' )
1
color generation
2
ordering data in terms of potential intelligibility
3
user defined function: plot ordered potential intelligibility

Figure 56: Model 12, ordered potential intelligibility per child and hearing status
Code
idx = with( model12_SI,
            order( HS, Am, mean, decreasing=F ) )
model12_SI = model12_SI[idx, ]

plot_col = rethink_palette[1:2]
  
model_pred( precis_object=model12_SI,
            parameter_object=model12_parameters,
            col_string=plot_col )
1
sorting data with hearing status and chronological age
2
defining colors for plot
3
user defined function: plot of potential intelligibility on chronological age, per hearing status group

Figure 57: Model 12, potential intelligibility per hearing status group and chronological age
Code
data_pred( true_data=data_H,
           stan_object=model12,
           normal=F, p=0.95, 
           y_lim=c(-0.1, 1.1) )
1
user defined function: plot entropy scores per child with transformation of potential intelligibility into entropy scale

Figure 58: Model 12, entropy score predictions
Code
require(rethinking)

set.seed(12345)

model_WAIC = WAIC( model12, pointwise=T )
model_PSIS = PSIS( model12, pointwise=T )
1
package requirement
2
seed for replication
3
pointwise WAIC estimates
4
pointwise PSIS estimates
Code
plot( model_PSIS$k, model_WAIC$penalty,
      xlab='PSIS k values', ylab='WAIC penalty',
      pch=19, col=rgb(0,0,0,alpha=0.2) )
abline( v=0.5, lty=2)

idx = which(model_PSIS$k>0.5)
for(i in idx ){
  text( x=model_PSIS$k[i], y=model_WAIC$penalty[i], 
        paste0('(', dlist$uid[i], ',', dlist$cid[i], ')' ) )   
}
1
plot of WAIC penalty and PSIS k values
2
identification of influential observations (sentences, individuals)

Figure 59: Outlier identification,

Journal (to erase)

Things to consider

  1. it is important to address the issue of statistical power. (Need to be addressed)

  2. Multiple NHST tests inflate null-hypothesis rejection rates. (NO need to address because of Bayesian)

  1. p-hacking (NO issues)

  2. dropping observation (NO issues)

  3. covariate analysis (NO issues, mention they are exploratory)

  1. Rich descriptions of the data help reviewers, the Editor, and other readers understand your findings. (already done)

  2. Cherry picking experiments, conditions, DVs, or observations can be misleading. (NO issues)

  3. Be careful about using null results to infer “boundary conditions” for an effect. (NO issues)

  4. Authors should use statistical methods that best describe and convey the properties of their data. (already done)

Follow Behavior research methods for more information.

References

Allaire, J., Teague, C., Scheidegger, C., Xie, Y., and Dervieux, C. (2022). Quarto. https://doi.org/10.5281/zenodo.5960048
Anderson, D. (2008). Model based inference in the life sciences: A primer on evidence. Springer.
Baker, F. (1998). An investigation of the item parameter recovery characteristics of a gibbs sampling procedure. Applied Psychological Measurement, 22(22), 153–169. https://doi.org/10.1177/01466216980222005
Baldwin, S., and Fellingham, G. (2013). Bayesian methods for the analysis of small sample multilevel data with a complex variance structure. Journal of Psychological Methods, 18(2), 151–164. https://doi.org/10.1037/a0030642
Boonen, N., Kloots, H., Nurzia, P., and Gillis, S. (2021). Spontaneous speech intelligibility: Early cochlear implanted children versus their normally hearing peers at seven years of age. Journal of Child Language, 1–26. https://doi.org/10.1017/S0305000921000714
Brooks, S., Gelman, A., Jones, G., and Meng, X. (2011). Handbook of markov chain monte carlo (1st ed.). Chapman; Hall, CRC. https://doi.org/10.1201/b10905
Chamberlain, T. (1965). The method of multiple working hypotheses. Science, 148(3671), 754–759. Retrieved from https://www.jstor.org/stable/1716334
Denwood, M. (2016). runjags: An R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS. Journal of Statistical Software, 71(9), 1–25. https://doi.org/10.18637/jss.v071.i09
Depaoli, S. (2014). The impact of inaccurate “informative” priors for growth parameters in bayesian growth mixture modeling. Journal of Structural Equation Modeling, 21, 239–252. https://doi.org/10.1080/10705511.2014.882686
Depaoli, S. (2021). Bayesian structural equation modeling. The Guilford Press.
Depaoli, S., and Schoot, R. van de. (2017). Improving transparency and replication in bayesian statistics: The WAMBS-checklist. Psychological Methods, 22(2), 240–261. https://doi.org/10.1037/met0000065
Everitt, B., and Skrondal, A. (2010). The cambridge dictionary of statistics. Cambridge University Press.
Faes, J., De Maeyer, S., and Gillis, S. (2021). Speech intelligibility of children with an auditory brainstem implant: A triple-case study. 1–50.
Flipsen, P. (2006). Measuring the intelligibility of conversational speech in children. Clinical Linguistics & Phonetics, 20(4), 303–312. https://doi.org/10.1080/02699200400024863
Freeman, V., Pisoni, D., Kronenberger, W., and Castellanos, I. (2017). Speech intelligibility and psychosocial functioning in deaf children and teens with cochlear implants. Journal of Deaf Studies and Deaf Education, 22(3), 278–289. https://doi.org/10.1093/deafed/enx001
Gabry, J., and Češnovar, R. (2022). Cmdstanr: R interface to ’CmdStan’.
Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., and Rubin, D. (2014). Bayesian data analysis (3rd ed.). Chapman; Hall/CRC.
Gorinova, M., Moore, D., and Hoffman, M. (2019). Automatic reparameterisation of probabilistic programs. Retrieved from https://doi.org/10.48550/arXiv.1906.03028
Heuven, V. van. (2008). Making sense of strange sounds: (Mutual) intelligibility of related language varieties. A review. International Journal of Humanities and Arts Computing, 2(1-2), 39–62. https://doi.org/10.3366/E1753854809000305
Hoffman, M., and Gelman, A. (2014). The no-u-turn sampler: Adaptively setting path lengths in hamiltonian monte carlo. Journal of Machine Learning Research, 15, 1593–1623. Retrieved from https://www.jmlr.org/papers/volume15/hoffman14a/hoffman14a.pdf
Holmes, W., Bolin, J., and Kelley, K. (2019). Multilevel modeling using r (2nd edition). Chapman; Hall/CRC. https://doi.org/10.1201/9781351062268
Hoyle, R. (eds. ). (2014). Handbook of structural equation modeling. Guilford Press.
Kim, S., and Cohen, A. (1999). Accuracy of parameter estimation in gibbs sampling under the two-parameter logistic model. Annual Meeting of the American Educational Research Association. American Educational Research Association. Retrieved from https://eric.ed.gov/?id=ED430012
Kruschke, D. (2015). Doing bayesian data analysis: A tutorial with r, JAGS, and stan. Elsevier. Retrieved from https://www.sciencedirect.com/book/9780124058880/doing-bayesian-data-analysis
Kullback, S., and Leibler, R. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1), 79–86. Retrieved from http://www.jstor.org/stable/2236703
Lagerberg, T., Asberg, J., Hartelius, L., and Persson, C. (2014). Assessment of intelligibility using children’s spontaneous speech: Methodological aspects. International Journal of Language and Communication Disorders, 49(2), 228–239. https://doi.org/10.1111/1460-6984.12067
Lambert, P., Sutton, A., Burton, P., Abrams, K., and Jones, D. (2006). How vague is vague? A simulation study of the impact of the use of vague prior distributions in MCMC using WinBUGS. Journal of Statistics in Medicine, 24(15), 2401–2428. https://doi.org/10.1002/sim.2112
Lebl, J. (2022). Basic analysis i & II: Introduction to real analysis, volumes i & II. Retrieved from https://www.jirka.org/ra/html/frontmatter-1.html
Lee, Y., and Nelder, J. A. (1996). Hierarchical generalized linear models. Journal of the Royal Statistical Society: Series B (Methodological), 58(4), 619–656. https://doi.org/10.1111/j.2517-6161.1996.tb02105.x
Luce, R. (1959). On the possible psychophysical laws. The Psychologcal Review, 66(2), 482–499. https://doi.org/10.1037/h0043178
Martin, J., and McDonald, R. (1975). Bayesian estimation in unrestricted factor analysis: A treatment for heywood cases. Psychometrika, (40), 505–517. https://doi.org/10.1007/BF02291552
McElreath, R. (2020). Statistical rethinking: A bayesian course with examples in r and STAN. Chapman; Hall/CRC.
McElreath, R. (2021). Rethinking: Statistical rethinking book package.
Neal, R. (2003). Slice sampling. The Annals of Statistics, 31(3), 705–741. https://doi.org/)
Neuwirth, E. (2022). RColorBrewer: ColorBrewer palettes. Retrieved from https://CRAN.R-project.org/package=RColorBrewer
Plummer, M., Best, N., Cowles, K., and Vines, K. (2006). CODA: Convergence diagnosis and output analysis for MCMC. R News, 6(1), 7–11. Retrieved from https://journal.r-project.org/archive/
R Core Team. (2015). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.R-project.org/
Rabe-Hesketh, S., Skrondal, A., and Pickles, A. (2004a). Generalized multilevel structural equation modeling. Psychometrika, 69(2), 167–190. https://doi.org/https://www.doi.org/10.1007/BF02295939
Rabe-Hesketh, S., Skrondal, A., and Pickles, A. (2004b). GLLAMM manual. UC Berkeley Division of Biostatistics. Retrieved from http://www.biostat.jhsph.edu/~fdominic/teaching/bio656/software-gllamm.manual.pdf
Rabe-Hesketh, S., Skrondal, A., and Pickles, A. (2004c). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128(2), 301–323. https://doi.org/https://www.doi.org/10.1016/j.jeconom.2004.08.017
Seaman, S. jr., J., and Stamey, J. (2011). Hidden dangers of specifying noninformative priors. The American Statistician, 66(2), 77–84. https://doi.org/10.1080/00031305.2012.695938
Shannon, C. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
Skrondal, A., and Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models. Chapman & Hall/CRC Press.
Stan Development Team. (2020). RStan: The R interface to Stan. Retrieved from http://mc-stan.org/
Stan Development Team. (2021). Stan modeling language users guide and reference manual, version 2.26. Vienna, Austria. Retrieved from https://mc-stan.org
Thurstone, L. (1927). A law of comparative judgment. Psychological Review, 34(4), 482–499. https://doi.org/10.1037/h0070288
Vehtari, A., Gabry, J., Magnusson, M., Yao, Y., Bürkner, P., Paananen, T., and Gelman, A. (2023). Loo: Efficient leave-one-out cross-validation and WAIC for bayesian models. Retrieved from https://mc-stan.org/loo/
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27, 1413–1432. https://doi.org/10.1007/s11222-016-9696-4
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2021). Pareto smoothed importance sampling. Retrieved from https://arxiv.org/abs/1507.02646
Watanabe, S. (2013). A widely applicable bayesian information criterion. Journal of Machine Learning Research 14, 14, 867–897. Retrieved from https://dl.acm.org/doi/10.5555/2567709.2502609
Whitehill, T., and Chau, C. (2004). Single-word intelligibility in speakers with repaired cleft palate. Clinical Linguistics and Phonetics, 18, 341–355. https://doi.org/10.1080/02699200410001663344
Wickham, H. (2007). Reshaping data with the reshape package. Journal of Statistical Software, 21(12), 1–20. Retrieved from http://www.jstatsoft.org/v21/i12/
Wickham, H. (2022). Stringr: Simple, consistent wrappers for common string operations. Retrieved from https://CRAN.R-project.org/package=stringr
Wickham, Hadley, Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Wickham, Hadley, François, R., Henry, L., Müller, K., and Vaughan, D. (2023). Dplyr: A grammar of data manipulation. Retrieved from https://CRAN.R-project.org/package=dplyr

Footnotes

  1. For a thorough explanation of the Bayesian inferences procedures the reader can refer to Kruschke (2015) or McElreath (2020).↩︎

  2. The reader can refer to Brooks et al. (2011) for a detailed treatment on MCMC methods.↩︎

  3. An interested reader can further refer to McElreath (2020) for a detailed explanation of grid approximation.↩︎

  4. An interested reader can refer to McElreath (2020) for a detailed explanation of priors and hyperpriors.↩︎

  5. An interested reader can refer to McElreath [-McElreath (2020); p. 420] for a detailed explanation of the challenges and reparametrizations required to set priors with hyperparameters.↩︎

  6. An interested reader can refer to McElreath (2020), Gorinova et al. (2019) and Neal (2003)↩︎

  7. An interested reader can refer to McElreath (2020) for a detailed explanation of prior predictive simulations.↩︎

  8. The reader can refer to Brooks et al. (2011) for a detailed treatment on HMC methods.↩︎